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ABSTRACT 


The  dissertation  focuses  on  the  steady-state  solution  to  the  linear 
estimation  and  control  problems.  Following  a  pertinent  review  of  linear 
algebra  and  computation  fundamentals,  various  approaches  to  solving  the 
matrix  Riccati  equation  are  examined.  After  reviewing  eigenvector 
decomposition  and  various  iterative  algorithms,  square  root  doubling 
algorithms  are  motivated.  Scattering  theory  is  used  to  initiate  an 
algebraic  derivation  of  these  algorithms,  and  is  then  extended  to  provide 
a  pure  derivation  of  several  square  root  algorithms. 

A  set  of  criteria  for  choosing  among  these  algorithms  is  presented 
and  examined  with  empirical  evaluations.  Differentiating  criteria 
include  sensitivity  to  repeated  closed-loop  eigenvalues,  the  impact  of 
singular  model  parameters,  computational  accuracy,  and  rate  of 
convergence . 

Sensitivity  to  parameter  uncertainty  in  discrete-time  systems  is 
considered  using  a  quadratic  minimization  of  a  generalized  cost  func¬ 
tion.  This  same  algorithm  is  used  to  design  arbitrary-order  compensa¬ 
tion  using  complete  system  information. > 

Algorithm  implementation  is  then  considered  in  terms  of  both  present 
and  future  hardware. 


!ion, 


In  conclusion,  fbr  'discrete-time  systems,  the  square  root  doubling 
algorithms  are  found  to  be  comparable  to  the  eigenvector  decomposition 
algorithms  in  terms  of  memory  requirements,  elapsed  computation  time, 
and  performance.  The  doubling  algorithms  are  also  found  to  solve  a 
wider  class  of  problems. 
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Chapter  I 
INTRODUCTION 

Consideration  of  how  to  interpret  measurements  and  of  how  to  make 
predictions  based  on  these  measurements  has  a  long,  colored,  and  oft 
obscure  history  ("Hamlet's  Mill,"  Santillana  [1969]).  It  has  been  a 
primary  concern  of  orderly  enquiry  since  at  least  the  earliest  mega- 
lithic  cultures.  But  only  much  later,  with  the  dawning  of  modern  science 
in  the  late  middle  ages,  did  measurement  and  considered  prediction 
become  a  central  paradigm  in  rational  enquiry  ("The  Waning  of  the  Middle 
Ages,"  Huizinga  [1924]),  ("The  Civilization  of  the  Renaissance  in  Italy," 
Burckhardt  [I860]), 

Refinements  have  often  come  about  in  unexpected  ways,  many  of  them 
fascinating.  Sir  Isaac  Newton  considered  the  verification  of  the  law 
of  universal  gravitation  fundamental  to  his  theory  of  motion--  so  funda¬ 
mental  he  strove  for  unparalleled  precision.  Beginning  with  data  accurate 
to  five  percent,  Newton  proceeded  to  derive  a  correlation  between  the  accel¬ 
eration  of  gravity  and  the  earth's  attraction  for  the  moon  accurate  to  one 
part  in  3000  [Newton,  1726]  --  a  persuasive  correlation!  Unfortunately, 
his  editor  (Professor  Roger  Cotes)  found  a  grievous  inconsistency  in 
the  calculation,  and  replied  [Edleston,  1850] 

I  have  considered  how  to  make  [it]  appear  to  the 
best  advantage  as  to  ye  numbers,  and  I  propose 
to  alter  it  thus. 

With  unbridled  faith  in  his  mentor's  theory.  Cotes  proceeded  to  alter 
the  latitude  of  Paris.  Sir  Isaac  declined  the  offer  (choosing  instead 
to  alter  the  orbit  of  the  moon). 


Measurement  and  estimation  continued  to  advance  in  the  early 
eighteenth  century,  but  in  1795  Gauss  introduced  linear  least-squares 
estimation,  and  progress  began  in  earnest.  The  past  forty  years  has 
seen  a  burgeoning  of  the  field,  beginning  with  work  by  Kolmogorov, 

Krein,  and  Wiener.  For  a  comprehensive  history,  see  Kailath  [1974]. 

Estimation  theory,  in  its  modern  context,  addresses  the  issue  of 
making  predictions  from  observations.  We  begin  by  assuming  a  stochastic 
process,  x(*)  ,  and  then  try  to  prognosticate  inferences  about  x 
based  upon  measurements  of  a  related  process  y(*)  .  In  linear  esti¬ 
mation  theory,  these  inferences  are  linear  functionals  of  the  measure¬ 
ments.  If  the  objective  is  to  estimate  x  ,  and  if  the  performance 
criterion  is  to  minimize  the  mean  square  error  between  x  and  its 
estimate,  x  ,  then  the  solution  simplifies.  The  theory  becomes  quite 
tractable,  depending  solely  on  the  first  and  second  moments  of  the 
original  and  of  the  observed  process. 

In  the  discrete-time  case,  the  solution  follows  immediately  with 
the  inversion  of  the  observation  process  covariance  matrix.  However, 
since  n  observations  require  on  the  order  of  n^  operations  for 
the  inversion,  this  can  become  a  Herculean  task  for  even  the  largest 
computer. 

An  alternate  approach  uses  recursive  filters,  which  update  state 
information  as  new  observations  become  available.  There  are  several 
formulations,  but  Kalman  filters  are  the  best  known  of  these  formulae. 
Using  the  state  space  model,  Kalman  [1960]  [1963]  formulated  the  process 
as  a  linear  dynamical  system  driven  by  white  noise.  The  observation 
process,  y,  is  a  function  of  the  state  process,  x,  and  is  similarly 


corrupted  by  white  noise. 


Arising  in  the  solution  of  this  recursive  problem  is  a  nonlinear, 
matrix  difference  equation,  an  example  of  the  Riccati  type  equations. 

The  solution  to  this  equation  is  of  primary  interest  in  the  sequel. 

This  dissertation  will  focus  on  discrete-time  problems;  the  signal 
importance  digital  computers  have  attained  in  our  current  technology 
often  obviates  continuous  alternatives.  Further,  the  benefits  of  digital 
implementations  are  persuasive.  Costs  can  be  reduced,  flexibility  is 
increased,  greatly  increased  accuracy  is  possible,  and  model  complexity 
can  often  be  incorporated  readily.  There  are  also  considerable 
disadvantages,  arising  because  of  the  interface  with  a  continuous 
world. 

Two  primary  factors  in  determining  costs,  in  both  the  design  and 
the  implementation  phase,  are  speed  and  complexity;  if  a  system  is  slow 
and  simple,  then  it  is  usually  less  expensive.  In  the  implementation 
context,  reducing  cost  implies  reducing  the  order  of  the  compensation,  the 
accuracy  of  the  computation,  and  reducing  the  sample  rate  (usually  at  the 
expense  of  the  system  bandwidth  and  sensitivity).  In  the  design  context, 
reducing  cost  implies  faster,  simpler  algorithms  and  synergetic  hardware. 

In  the  sequel,  various  design  algorithms  are  considered  from  several 
different  analytical  perspectives.  Both  theoretical  and  empirical 
comparisons  are  made.  Sensitivity  and  compensation  complexity  are  also 
considered.  The  relationship  between  algorithms  and  hardware  is  presented 


as  the  denouement. 


Dissertation  Outline 


Chapter  2  is  a  brief  and  fast-paced  review  of  linear  algebra, 
numerical  analysis, and  fundamental  algorithms.  Definitions  are 
presented,  concepts  are  outlined,  and  references  are  given  for  additional 
background,  if  required. 

Chapter  3  provides  the  gist  of  the  dissertation.  Half  of  the 
chapter  is  a  review  of  earlier  work.  It  begins  by  developing  initial 
criteria  for  gauging  the  algorithms  of  the  sequel;  standard  iteration 
of  the  matrix  Riccati  equation  is  prescribed  as  a  benchmark.  Next,  a 
standard  algorithm  for  solving  the  steady-state  Riccati  equation  is 
reviewed  (eigenvector  decomposition) . 

Before  presenting  another  class  of  algorithms,  orthogonal  trans¬ 
formations  are  considered  in  pertinent  detail;  these  mappings  are  then 
used  in  an  algebraic  derivation  of  square  root  algorithms.  Several 
questions  arise,  and  are  deferred. 

Scattering  theory  is  then  introduced  and  applied  to  the  iteration 
of  the  Riccati  equation  and  to  the  eigenvector  decomposition  developed 
earlier.  Scattering  theory  provides  the  impetus  for  developing  the  next 
set  of  algorithms,  the  square  root  doubling  algorithms.  They  are  found  to 
be  fast,  widely  applicable,  and  numerically  well  behaved,  but  their  deri¬ 
vation  again  raises  questions  that  are  deferred. 

To  answer  these  questions,  and  to  gain  insight  into  the  underlying 
structure  of  square  root  algorithms,  the  concept  of  the  square  root  of 
a  scattering  medium  is  developed.  The  ensuing  physical  insight  clarifies 
many  of  the  unresolved  issues,  and  provides  for  rapid  derivation  of 
several  of  the  square  root  algorithms. 


The  chapter  ends  after  proposing  possible  remedies  for  several  of 
the  algorithm's  weaknesses. 

Chapter  4  briefly  examines  the  connection  between  the  continuous 
and  the  discrete  solution  to  the  matrix  Riccati  equation.  Various 
doubling  formulae  are  examined,  with  varying  degrees  of  disfavor. 

In  Chapter  5  the  problem  of  a  singular  state  transition  matrix  is 
examined  from  several  perspectives.  Gever's  [1972]  work  on  order  reduc¬ 
tion  is  reviewed,  as  are  Katz's  [1974]  and  Powell's  [1978]  conjectures. 
Pure  prediction,  singular  covariance  matrices,  and  repeated  closed- loop 
eigenvalues  are  also  considered. 

Chapter  6  presents  empirical  results  comparing  the  algorithms  of 
Chapters  three  and  four.  Memory  requirements,  elapsed  execution  time, 
speed  of  convergence,  and  accuracy  are  the  primary  criteria  for  evalua¬ 
tion  of  the  results. 

Chapter  7  considers  a  procedure  for  reducing  the  sensitivity  of 
compensation  to  the  effects  of  uncertainties  in  the  parameters  of  the 
model  of  the  system.  This  procedure  is  compared  to  an  ad^  hoc  approach. 
Limitations  are  delineated. 

Chapter  8  uses  the  algorithm  of  the  previous  chapter  to  design 
compensation  of  arbitrary  order.  The  results  are  compared  to  frequency 
domain  design,  and  are  applied  to  a  practical  problem. 

Chapter  9  discusses  the  problems  associated  with  algorithm  imple¬ 
mentation.  The  focus  is  on  the  weaknesses  and  false  promises  of  present 
hardware,  the  need  for  considerable  research  into  algorithm-hardware 
interaction,  and  for  the  development  of  new,  specialized  computers. 


Chapter  II 
MATRIX  FUNDAMENTALS 
A  Focused  Review 

Fundamental  matrix  properties  and  the  problems  inherent  in 
actually  computing  with  arrays  lies  at  the  heart  of  this  treatise.  For 
that  reason  a  brief  discussion  of  matrices,  computation,  and  algorithms 
will  follow. 

The  need  for  specific  algorithms  to  solve  specific  problems 
dictated  the  contents  of  this  chapter.  We  assume  a  working  knowledge 
of  matrices,  review  our  definitions,  and  then  concentrate  on  the  details 
and  their  implications  for  computation.  The  objective  is  to  review  the 
quintessential  algorithms,  examining  why  they  are  relevant,  and  what 
we  can  expect  from  them.  Lesser  algorithms--  for  example,  Gram- 
Schmidt  orthogonalization  and  Gaussian  Elimination--  will  be  mentioned 
along  the  way,  often  in  a  quite  familiar  fashion.  If  these  references-- 
to  pivoting,  poorly  bounded  errors,  etc. --are  not  well  understood, 
these  secondary  references  are  not  crucial  to  an  understanding  of  the 
remainder  of  this  treatise. 

This  is  a  field  rich  with  fine  work  and  excellent  texts.  Many 
must  be  consulted,  however,  to  gamer  the  full  flavor  of  the  field. 

James  Hardy  Wilkinson  IWilkinson,  1965]  is  preeminent  in  the  field,  as 
is  Gene  Golub  [1965].  Also  consider  Forsythe  [1967],  Moler  [1974], 

Acton  [1970],  Lawson  [1974],  and  Strang  [1976]. 
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II . 2  Fundamentals 

Computation--  the  Theory  and  the  Practice 

The  primary  constraint  computers  place  on  purely  theoretical 
algorithms  is  the  unavailability  of  real  real  numbers.  Instead,  there 
are  integers  and  there  are  ersatz  real  numbers,  termed  floating  point 
numbers.  A  variety  of  ersatz  schemes  exist,  but  conceptually  the  real 
number  is  represented  in  scientific  notation,  with  a  truncated  frac¬ 
tional  part  and  a  bounded  exponent. 

Thus,  for  example,  we  might  require  the  fractional  part  to  be 
greater  than  one-half,  but  less  than  one,  and  limited  to  seven  digits 
of  precision.  The  exponent  part  might  be  limited  to  three  digits  of 
precision.  Thus,  we  might  have 

ffjfj  ...  f?  x  ge  (2.1) 

where  B  is  the  base  (2,8,10,  or  16  are  common),  f  is  the  fractional 
part,  and  e  the  exponent. 

First,  we  note  that  we  have  specified  a  finite  set  on  the  real 
line  of  far  from  equidistant  points;  small  numbers  are  densely 
represented;  large  numbers  are  comparatively  sparsely  represented. 
Second,  most  real  numbers  must  be  represented  only  approximately, 
especially  if  the  number  lies  outside  the  range  of  the  exponent. 

The  fundamental  theorem  in  floating-point  rounding-error  analysis 
defines  the  relative  error;  following  Moler  [1967] 

if  we  replace  a  real  number  x  by  the  closest  floating  point 

approximation,  x£  ,  then 

Xf  -  x(l+<5) 
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where 

|6|  £  t  ,  t:  the  number  of  digits  of  precision. 

(2.2) 

This  result  can  be  extended  to  bound  various  floating  point  operations, 
such  as  addition  or  multiplication,  and  therefore  motivates  our  later 
introduction  of  matrix  conditioning. 

Mappings 

The  square  matrix  A  represents  a  linear  mapping,  or  transforma¬ 
tion,  of  each  vector  x  of  one  n-dimensional  space  X  into  the  vector 
y  =  Ax  of  a  second  n-dimensional  space  Y  .  Mappings--  especially 
orthogonal  mappings--  are  very  important  in  what  follows. 

Singular  values,  Eigenvalues,  and  Eigenvectors 

The  fundamental  algebraic  eigenvalue  problem  is  the  determination 
of  those  scalars  A  for  which  the  equation 

Ax  =  Ax  (2.3) 

has  a  non-trivial  solution.  The  general  theory  of  simultaneous  linear 
algebraic  equations  shows  that  there  is  a  non-trivial  solution  if,  and 
only  if,  the  matrix  (AI-A)  is  singular.  This  may  be  restated  as 
requiring  that 


det  (AI-A)  =  0 

This  determinant  may  be  expanded  into  the  polynomial 


(2.4) 


0 


(2.5) 
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where  n  is  the  order  of  A.  This  equation  always  has  n  roots, 
called  the  eigenvalues  of  the  matrix  A. 

Corresponding  to  any  eigenvalue  X  ,  the  set  of  equations  (2.3) 
has  at  least  one  non-trivial  solution  x  ;  this  solution  is  called  an 
eigenvector  corresponding  to  that  eigenvalue.  If  the  rank  of  (XI-A) 
is  less  than  (n-1)  ,  then  there  will  be  more  than  one  independent  vector 
satisfying  (2.3).  Also,  if  x  is  a  solution  to  (2.3),  then  cx  is 
also  a  solution,  where  c  is  a  scalar. 

Not  all  systems  have  a  complete  set  of  eigenvectors.  For  example, 

the  matrix  A  ,  where 
r 


has  the  repeated  root  r  times,  but  has  only  one  eigenvector 


x  =  [1  0  . . .  0]1 


We  will  return  to  this  issue  later. 

A  matrix  is  positive  definite,  A  >  0  ,  if  all  eigenvalues  are 

greater  than  zero;  positive  semidefinite  or  non-negative  definite, 

A  >_  0  ,  if  none  of  the  eigenvalues  are  negative. 

Singular  values  are  the  non-negative  square  roots  of  the  eigenvalues 

T 

of  the  symmetric  matrix  AA  ,  and  are  denoted  jjl.  Practically  all  of 
the  important  properties  concerning  the  solution  of  a  system  of  equations 
with  the  matrix  A  are  determined  by  the  nature  of  the  matrix's  singular 
values  (Forsythe  [67]);  convergence  and  accuracy  are  often  two  major  pro¬ 
perties  . 
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For  non-negative  definite,  symmetric  matrices,  the  eigenvalues 
and  singular  values  correspond. 

Matrix  Transformations--  the  Definitions 

Practical  strategies  for  computing  eigenvalues  break  the  problem 
into  cwo  parts:  reducing  the  original  matrix  to  a  specialized  matrix 
having  many  zero  elements  but  the  same  eigenvalues,  followed  by  the 
finding  of  the  eigenvalues  for  the  specialized  matrix.  The  square  root 
algorithms  presented  in  the  sequel  also  begin  by  reducing  the  system 
to  specialized  forms  as  an  initial  step. 

Reductions  of  general  matrices  may  be  phrased  in  terms  of 
similarity  transformations.  The  matrix  A  is  said  to  "suffer"  a  similarity 
transformation  if  it  is  pre-  and  post-  multiplied  by  any  other  matrix 
and  its  inverse;  the  only  restriction  is  that  the  inverse  must  exist: 

B  =  TAT-1 
C  =  T-1AT 

Similarity  transformations  have  the  property  that  they  preserve  eigen¬ 
values  . 

An  important  class  of  similarity  transformations  are  orthogonal 
transformations.  The  transforming  matrix  T  is  orthogonal  if  its 
transpose  is  also  its  inverse 


This  property  is  enough  to  preserve  symmetry  through  the  similarity 
transformation.  More  importantly,  orthogonal  transformations  are 
extremely  stable  (see  below). 
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Orthogonal  transformations  consist  of  a  possible  reflection  in  some 
hyperplane,  followed  by  a  rigid  rotation  of  n-dimensional  space  onto  it¬ 
self.  The  key  facts  to  remember  concerning  orthogonal  transformations  are 

1)  They  are  simple  to  generate, 

2)  They  are  useful  for  creating  blocks  of  zeroes, 

3)  They  are  useful  for  creating  triangular  forms, 

4)  They  preserve  eigenvalues  and  symmetry* 

5)  Their  errors  are  bounded  and  well-behaved. 

They  also  generally  require  more  computation  than  their  competition. 

Norms 

Norms  are  measures  of  length  and  distortion.  A  common  norm  is 
Euclidean  distance,  denoted  by  j|x||  ,  which  we  define  as 

INI  =  A\\2  +  !  x2 1 2  +  ...  |xn|2  =  /xT  (2.8) 

This  norm  obeys  our  common  sense  feeling  for  determining  distance,  but 
it  is  often  difficult  to  compute;  computers  are  notoriously  slow  at 
performing  square  roots . 

The  Euclidean  norm  obeys  the  following  properties  in  two  and  three 
dimensions : 

||cx||  =  | c |  •  ||xj|  V  real  c  ,  and  all  vectors  x  (2.9) 

||0||  =  0  and  ||x||  >  0  iff  x  ft  0  ,  where  0  is  the  null 

vector  (2.10) 

||x+y||  £  j[ x ||  +  || y ||  for  all  vectors  x  and  y  .  (2.11) 

If  we  take  these  properties  as  our  definition  of  a  norm,  we  can  generate 
several  measures  which  are  more  readily  computable.  For  example 


IWIj  •  Z  Ixjl 

1  i=l  1 

IML  =  max  lxJ 

l<i±n 

Both  of  these  norms  satisfy  our  formal  requirements  for  a  measure,  although 
they  don't  allow  us  to  use  our  geometric  (Euclidean)  intuition.  More 
importantly,  they  are  both  readily  computable,  and  in  the  appropriate 
context  each  can  be  used  to  answer  the  question,  "which  is  smaller." 

We  now  define  the  norm  of  a  square  matrix  A  in  the  obvious  fashion-- 
as  a  direct  extension  of  the  vector  definition: 


1 1 A 1 1  =  max  - 
x/0 

As  a  direct  consequence 

Ax 

X 

J-  =  max 

'  IM|. 

II  Ax  || 

=  1 

(2.14) 

||cA||  =  |  c  |  •  ||  A|| 

V  real  c  and 

all  A 

(2.15) 

II G II  =  0  and  II  All 

>  0 

if  A^0  ,  where 
the  null  matrix 

0  is 

(2.16) 

l|A+B||  <  ||A||+  ||  B  || 

Further,  it  follows  immediately  that 

V  matrices  A 

and  B 

(2.17) 

> 

X 

1  A 

> 

• 

IMI 

V  A,x 

(2.18) 

> 

DO 

1  A 

> 

• 

l|B|| 

V  A,  B 

(2.19) 

It  is  these  last  two  properties  that  make  norms  so  useful  in  analyzing 
linear  mappings  and  linear  systems  of  equations. 

As  with  the  case  of  vectors,  we  can  propose  more  readily  computable 


the  norm  (2.12) 


the  L  norm  (2.13) 

OO 


norms,  such  as 
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llAll  i  =  max  Z  |  a.  .  | 

l<j<n  i=l  J 

n 

II  Alloo  =  max  E  I  aii  I 

!<i<n  j=l  J 


(2.20) 

(2.21) 


Distortion 

Combining  many  of  the  notions  we  have  defined  above,  we  can 
consider  the  norm  of  a  matrix  A,  for  example 

||  A  ||  =  max  i|  Ax  ||  (2.22) 

l|x||-l 

to  be  a  measure  of  the  distortion  under  the  transformation  x  -*■  Ax  . 

For  || A||  is  the  length  of  the  longest  vector  in  the  image  set 
{Ax}  of  the  unit  sphere  {x:||xj|=l}  under  the  linear  transformation. 

Quoting  results  to  be  found  in  any  treatment  of  singular  values, 
if  we  have  a  non-singular  square  matrix  A  with  singular  values  arranged 
such  that 


Pj  >  y2  >  ...  >!.„  >  0 

then  there  is  one  vector,  x^  ,  in  X  such  that  A  stretches  x^  by  y^, 

as  x  is  mapped  into  Ax,  of  Y.  There  is  a  second  vector,  x  ,  that 
1  1  n 
stretches  by  y^.  x^  and  x^  are  orthogonal  (assuming  y^  i  y^),  as  are 

Ax.  and  Ax  in  Y  .  A  unit  circle  in  the  plane  of  x,  and  x 
In  1  n 

is  mapped  into  an  ellipse  with  semiaxes  y^  and  y^  .  This  is  the 
greatest  distortion  which  can  occur  to  any  circle  in  X  . 

Further,  the  singular  values  give  the  Euclidean  norm  directly, 


for 
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Ml  =  Vj  (2.23) 

llA-'H  -  V.;1 

Orthogonal  matrices  are  useful  because 

1 1  Tx  1 1  =  ||  x  ||  (2.24) 


for  all  orthogonal  matrices  T  ;  orthogonal  matrices  preserve  length 
and  do  not  distort. 

Conditioning  of  Matrices 

An  especially  important  problem  to  consider  when  programming  an 
algorithm  is  the  impact  of  numeric  truncation  on  the  behavior  of  the 
computation.  Two  questions  need  to  be  asked: 

.1)  Will  the  algorithm  remain  stable? 

2)  Will  the  final  answer  be  correct? 

Again  we  consider  the  square,  nxn  ,  nonsingular  matrix  A,  this 
time  in  the  context  of  solving  the  linear  equation  Ax=b  for  the 
vector  x  .  We  begin  by  assuming  A  is  known  exactly,  but  we  assume 
b  is  perturbed  from  the  correct  value  by  an  amount  Ab  .  How  large, 
then,  can  Ax  be?  We  have 


Ml  •  IM|  >  |jb|| 


(2.25) 


Ax 


A_1Ab 


(2.26) 


llAxll  £  ||A_1|J  *  1 1  Ab  1 1 


(2.27) 
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We  begin  by  noting  that  for  the  proper  choice  of  Ab  ,  the  equality 
holds  in  (2.27).  Then,  reversing  (2.25),  multiplying  by  (2.27),  and 
assuming  b  is  nonzero,  we  have: 

11**11  •  M  £  II A |J  •  II A" 'll  •  ||*[|  •  || Ab ||  (2.28) 


Ax 

X 

< 


II  A | j 


(2.29) 


We  next  define  the  spectral  condition  number,  k(A),  to  be  the  quantity 
|| A ||  •  || A  ^||  .  For  the  Euclidean  norm,  we  have 

k(A)  =  Uj/Ph  ^  1  (2.30) 

Therefore,  the  condition  number  is  a  measure  of  the  maximum  distortion 
the  transformation  A  makes  on  the  unit  sphere.  Expression  (2.29) 
is  an  expression  of  the  relative  uncertainty  in  the  solution  vector  x 
due  to  relative  uncertainty  in  the  data  vector  b  .  It  can  never  be 
less  than  1  . 

Similarly,  if  there  is  uncertainty  in  A,  we  have 


Ax 

X 

<  k(A) 


(2.31) 


Caveats : 

The  bound  is  precise  in  that  for  proper  choice  of  b  and  Ab 


Ax 

x 

=  k(A) 


(2.32) 


Since  k(A)  is  never  less  than  unity,  uncertainty  can  never  decrease. 


k(A)  is  a  good  measure  of  the  quality  of  the  mapping.  Conditioning 


is  far  more  important  to  determining  performance  than  the  determinant 
or  the  order  of  the  system  [Moler,  1967].  Again,  we  see  that  orthogonal 
mappings  are  likely  to  be  well  behaved,  because  their  condition  number 
is  unity. 

It  is  important  to  note  that  even  small  systems  can  be  extremely 
ill  conditioned.  For  example,  the  condition  number  of  the  matrix 


99 

100 

98 

99 

(2.33) 


is  nearly  40,000;  this  matrix  will  introduce  serious  distortion. 


Orthogonal  Reductions 

A  popular  reduction,  and  one  that  we  shall  consider  often  in  the 
sequel,  takes  a  general  matrix  into  a  triangular  form,  with  all  zeroes 
either  above  or  below  the  diagonal,  for  example 


X  ...  X 

X  ...  X 

\ 

;  ; 

=> 

• 

. 

0 

\ 

X  •  •  •  X 

X 

\ 

One  of  the  better  behaved  triangularizations —  the  Givens  approach- 
proceeds  by  making  an  orderly  series  of  plane  rotations.  For  example, 
in  the  kth  step,  we  would  pre-multiply  by  rotations  in  the 
(k,k+l),  (k,k+2),  ...  ,  (k,n)  planes.  Zeroes  in  the  kth  column  are 
then  produced  one  at  a  time,  with  each  rotation;  zeroes  in  preceding 


columns  are  unaffected. 


The  Givens  transformations  are  easy  to  construct.  To  rotate  an 


angle  0  ,  we  have 


T  = 


cos  0 
-sin  0 


sin  0 
cos  0 


If  we  embed  this  rotation  in  a  higher  dimension  space,  we  can  rotate 
in  the  plane  (k,£)  through  pre-multiplying  by 


T  = 


COS0 


— sin© 


sin0 


cos0 

1 


,  for  k  <  £  <  n 


0  =  cos 


-1 


kk 


Compared  with  triangularization  by  Gaussian  elimination,  plane 

rotations  are  expensive:  Gauss,  with  complete  pivoting,  requires 

1  3  4  3  1  2 

y  n  multiplications,  compared  to  y  n  multiplications  and  j  n 

square  roots  for  Givens.  However,  with  Gaussian  elimination  the 

perturbation  error  bound  always  contains  a  factor  of  2n  *  ,  whereas 

the  Givens  reduction  never  displays  such  unwarranted  growth. 

We  next  consider  the  prima  reductions,  by  A.S.  Householder:  his 

famous  Householder  reflection.  These  transformations  seem  to  have  the 

essential  advantages:  they  are  orthogonal,  they  have  better  error  properties 

than  even  plane  rotations,  they  are  fast--  though  they  still  require 
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2  3 

y  n  multiplications  and  n  square  roots--  and  they  are  easy  to 
compute.  As  a  consequence,  they  are  often  recommended  sans  reservation. 

With  the  Householder  reflection,  we  map  the  chosen  vector,  "a"  , 
onto  the  first  coordinate  of  the  orthogonal  space,  but  reflected  from 
its  initial  orientation  in  that  space.  Schematically  (see  Figure  2.1) 
we  could  look  upon  this  reduction  as  a  series  of  Givens  rotations,  termin¬ 
ated  with  a  reflection,  lumped  into  a  single  mapping.  A  better  interpre¬ 
tation  can  be  seen  from  Figure  2.2,  a  second  order  example.  We  begin 
by  constructing  the  vector  "t"  with  the  same  coordinates  as  "a"  , 
except  for  the  first  coordinate  of  "t"  ,  which  is  a^  +  ||a||  .  We  then 
construct  the  plane,  P  ,  perpendicular  to  "t"  and  passing  through 
the  origin.  The  mapping  we  desire  will  reflect  all  vectors  through 
this  plane.  A  moment's  reflection  shows  that,  by  construction,  Ta 
produces  the  mirror  image  of  "a"  lying  along  "e^"  ,  the  first 
coordinate  of  the  space. 

Construction  of  T  is  especially  simple.  We  choose  T  to  be 
tt ' 

I  -  r-r—  .  Then,  we  have  that 
t  a 


Ta 


(T  tt' 

(I  '  F¥)a 


a 


tt  'a 
t '  a 


=  a-t 


as  desired. 


Furthermore,  T  is  orthogonal: 

T '  =  (I  -  y£y) '  =  I  -  — —  ,  since  t'a  is  a  scalar 
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tt '  2 

TT-  =  (i  -  ~r 


=  i 


=  i 


2tf  tt'tt* 
t'a  +  t'at'a 


2 

t'at'a 


(t  'a 


2 

t'at'a 


(t '  a 


=  I  ,  since  t't  =  2t'a 


As  may  already  be  clear,  to  reduce  a  column  of  A,  we  need  not 
multiply  A  explicitly  by  the  matrix  T  .  Rather  (see  Figure  2.3),  we 
calculate  ||a||  , 


1)  a  =  sign(a1)  ||aj| 


Then  we  calculate  "t"  by  augmenting  "a"  by  a  , 

2)  t  =  a  +  ae1 


which  only  requires  one  addition.  Next,  we  need  the  scale  factor 
t'a  ,  which  is  again  a  single  operation 

3)  8  =  t'a  =  (a+aep^a  =  crt^ 


Then,  for  any  vector,  "b"  ,  we  must  calculate  an  inner  product 

4)  tT  •  b 
normalize 

5)  y  =  tTb/S 

and  reflect  the  vector  "b" 
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If  A  is  symmetric,  but  not  positive  definite,  then  there  will 
be  no  real  non-singular  B  .  If  there  is  a  triangular  decomposition. 


then  B  will  consist  of  either  pure  real  or  pure  imaginary  columns. 
Thus,  we  may  write  B  as 

LS  ,  where  A  =  L£LT  ,  Z  =  S2 


where  L  is  a  pure  real  lower  triangular  matrix  and  Z  ,  the  signature 
matrix,  is  a  diagonal  matrix  whose  diagonal  elements  are  either  1  or  -1. 

For  the  symmetric,  positive  definite  case,  the  best  approach  to 
calculating  the  square-root  is  normally  Cholesky  decomposition,  a 
simple  and  well  behaved  algorithm.  We  define  the  vector  .  to  be 
the  elements  of  row  i  up  to,  but  not  including,  element  j  .  Cholesky 
decomposition  (in  place]  begins  by  first  zeroing  all  entries  above  the 
diagonal . 


=> 


Next,  we  step  down  the  diagonal,  performing  each  of  two  operations 
at  each  stage.  First,  the  diagonal  element,  a^.  ,  is  replaced  by 


a. .  <==  (a. .  -  V. .  V. .) 2 
}J  JJ  JJ  JJ 


Second,  we  step  down  the  jth  column,  beginning  with  row  i=j+l  .  Each 

element  a.  .  ,  is  replaced  by 
t  >  J 
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a.  . 
ij 


-T  - 

-  v! .v. . 

jj  3  J 


a .  . 


The  Cholesky  is  extremely  stable,  never  needs  pivoting,  and  is 
n3 

quick--  requiring  only  -g-  multiplications  and  n  square  roots.  But 
yield  positive  definiteness,  and  you  lose  the  numerical  stability  of 
symmetric  decomposition. 

Kaminski  [1971]  claims  Cholesky  decomposition  can  be  applied  to 
the  singular  case;  the  algorithm  he  presents  in  Appendix  A,  however, 
fails  for  some  singular  matrices.  In  contrast,  Wilkinson  [1965] 
states  "it  cannot  be  emphasized  too  strongly  that  the  symmetric  decomposi¬ 
tion  of  a  matrix  which  is  not  positive  definite  enjoys  none  of  the 
numerical  stability  of  the  positive  definite  case"  [p.  231]. 

For  the  singular  case,  we  use  Singular  Value  Decomposition. 


Singular  Value  Decomposition  (SVD) 

Singular  value  decomposition  uses  the  QR  algorithm  to  decompose 
a  matrix  A  into 


A  =  Q 
mxn  Xmxm 


nxn 

0 


nxn 


,  m  >  n  (w.o.l .g.) 


where  Q  and  R  are  orthogonal  and  S  is  diagonal  and  non-negative. 
Assuming  A  is  square,  symmetric,  and  non-negative,  then  Q  =  R. 

Then  the  square  root  of  A  is 


Singular  value  decomposition  is  accurate;  its  deficiency  is  its 
complexity.  According  to  Lawson  [1974]  Cholesky  decomposition  is  more 
than  twenty-five  times  faster  than  SVD.  Unless  A  may  be  singular, 
Cholesky  decomposition  is  to  be  preferred. 

Finding  Eigenvalues  and  Eigenvectors 

The  currently  best  accepted  algorithm  for  finding  eigenvalues  is 

generally  a  two  step  procedure:  reduce  the  matrix  to  a  standard  form, 

using  orthogonal  transformations  (for  example,  Householder  reflections), 

and  then  apply  an  iterative  eigenvalue  decomposition  routine  such  as 

the  QR  algorithm  to  generate  the  eigenvalues. 

For  a  general  matrix,  we  begin  by  reducing  the  matrix  to  Hessenberg 

form  --  all  zeroes  below  the  subdiagonal  --  while  still  preserving  eigen- 

5  3 

values;  this  requires  y  n  multiplications.  Then  we  begin  iterating, 

using  an  algorithm  that  steadily  reduces  the  magnitude  of  the  off- 

2 

diagonal  elements.  This  requires  4n  multiplications  per  iteration. 
Convergence  on  each  iteration  is  very  fast;  for  the  case  of  symmetric 
matrices,  it  is  at  least  cubic.'  Total  calculations,  including  all 

3 

iterations,  is  normally  about  4n  (Lawson  [1974]). 

Multiple  eigenvalues  do  not  pose  any  special  problems.  Their 
associated  eigenvectors,  however,  introduce  a  perversity  we  cannot  avoid. 
For  example,  with  symmetric  matrices,  multiple  eigenvalues  correspond  to 
a  circular  cross  section  in  the  corresponding  subspace,  and  therefore 
any  direction  is  intrinsically  an  axis.  Concommitantly ,  almost  equal 
eigenvalues  correspond  to  almost  circular  subspaces.  The  computational 
problem  remains  difficult  [Acton,  1970]. 
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Chapter  III 

THE  STEADY-STATE  SOLUTION  TO  THE  DISCRETE  RICCATI  EQUATION 


As  we  briefly  outlined  in  Chapter  1,  the  solution  to  the  linear 
recursive  state  estimation  problem  and  to  the  quadratic  control  problem 
are  strongly  connected  to  the  solution  of  certain  Riccati  differential 
and  difference  equations.  In  this  chapter  we  will  examine  a  variety  of 
algorithms  for  solving  the  appropriate  discrete  Riccati  equation.  Part 
of  this  analysis  will  build  upon  the  connection  between  estimation 
theory  and  scattering  theory  elucidated  by  Ljung,  Kailath,  and 
Friedlander  [1976].  The  introduction  of  a  physical  paradigm  leads  to 
new  insights  into  the  strengths,  weaknesses,  and  intrinsic  structure  of 
several  algorithms. 

Before  considering  the  solutions,  we  more  precisely  define  the 
problem.  Two  applications  of  linear  least-squares  theory  are  central 
to  the  development  of  this  chapter: 


PROBLEM  I--  State  Estimation 


Let  be  an  n-dimensional  discrete  vector  process  generated  by  white 

noise  driving  a  linear  dynamical  system: 


xi+i  =  *1  xi  +  ri  wi 


(1.1) 


and 


let  y^  be  a  linear  function  of  x^  ,  observed  through  white  noise 


y.  =  H.  x.  +  v. 

JX  11  1 


(1.2) 
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w.  and  v. 
1  1 


respectively 


are  referred  to  as  process  noise  and  measurement  noise, 
We  further  assume 


f  x  =  e-  w .  =  5,  v .  =o 

O  11 


fx  x  =0 
o  o 


T 

f  V  .  w . 

1  J 


F  w .  w  .  =  Q  •  6 .  .  f.  v.  v. 
i  J  1  iJ  i  J 


M  w.  =  0 
o  J 


F  6.  . 
ij 


m  v:  =  o 

o  J 


R.  6.  .  ,  R;  >  0 

1  ij  1 

/0  i^j 
1  i=j 


Objective: 

To  find  a  linear  estimate  of  x^  given  measurements  up  to, 

but  not  including,  measurement  i  .  This  estimate,  >  is  chosen 

to  minimize  the  expected  error  covariance. 


M. 

l 


i-1 


x.)  (x.  , 

l  i 


i-1 


-  x.)T} 


Vi 


(1.4) 


Solution : 

Assuming  R^  >  0,  then  the  optimal  linear  estimate,  x^ | .  ^  ,  in 
the  sense  of  minimizing  equation  (1.4),  is  given  by 


x.  .  |  .  =  (I-K.H.)<f>.x.  |  .  . 
l+l  l  ^  11111-I 


K.y. 

l  l 


where 


T  -1 

K.  =  P.  Hf  R. 

l  ill 
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and  the  error  covariance  in  (1.4),  Nh  | ^  ^  ,  i s  given  by  the 
following  pairs  of  equations: 


Measurement  Update 


P.  I  .  =  M 
11 


-  M. 


iTi-1 


,  .  -  I  .  .  h:  [R.  -  H.  M.  I  .  ,  H  I'  *  H.  M. 

lll-l  lll-l  ll  1  1  lll-l  1J  11 


(1.7) 


Pi 


Time  Update 


i+1  i 


T  T 

!>.  p.  . .  <*)'  +  r.  q.  r! 

i  1  1  1  i  xi  i 


=  ${(X.  ,  .-X.)  (X.  .  ,-x.)T) 

1+1  1  1  1+1  1  1J 


Equations  (l.S)  through  (1.7)  comprise  the  Kalman  filter  for  discrete 
processes . 

PROBLEM  II--  Closed  Loop  Control 

Let  be  an  n-dimensional  discrete  vector  process,  driven  by  an 

external  input  u^  ,  where  the  governing  linear  dynamical  system  is 
again 

x.  ,  =  <t>.  x.  +  r.  u.  ,  Vi,0<i<N  (1.8) 

xq  given 

where  is  a  deterministic  input  sequence. 


To  find  the  input  sequence  u^  which  minimizes  a  specified  scalar 
cost  function  of  the  form 


J  = 


N 

%  l 

i=0 


[xT  A.x.  +  uT  B.  u.  ] 
1  11  1  ii 


where  A.  and  B.  are  symmetric  matrices;  A^  is  non-negative  definite 
and  is  positive  definite. 

Solution: 

Assuming  B^  >  0,  then  the  optimal  input  u^  in  the  sense  of 
minimizing  equation  (1.9)  is  given  by 


u. 

i 


=  “Vi 


(1.10) 


where 


C. 

l 


r. 

i 


|).  (S.-A.) 

i  i  l 


(1.11) 


and  where  S. 

l 


can  be  found  by  solving 


S. 

l 


(s:11+r\  B^r1)-1 

^  i+i  i  i  i/ 


4> .  +  A. 
1  1 


SN  =  AN 


(1.12) 


Caveats : 

Both  problems  are  discrete-time  applications,  often  informally 
referred  to  subsequently  as  discrete  systems.  Amplitude  will  always  be 
treated  as  a  continuous  dimension  in  the  domain  of  the  model.  Actual 
computations  will  never  involve  real-valued  quantities,  but  will  be  "good" 
discrete  approximations,  depending  upon  the  precision  available  for  the 


calculation. 


Most  problems  will  involve  the  steady-state  solution  to  time 
invariant  problems.  In  this  eventuality,  the  subscripts  will  be  dropped 


from  the  model  parameters  to  simplify  the  notation. 

Another  simplification  accrues  from  recognizing  the  similarity 
between  the  two  applications.  With  the  introduction  of  an  appropriate 
mapping  between  variables,  known  as  duality,  a  solution  to  one  problem 
can  be  claimed  as  the  solution  to  its  dual  problem.  This  duality  is 
contained  in  the  following  substitutions: 

CONTROL  <=>  ESTIMATION 
T 

r  h 

B  R 

A  rQrT 

SN  P0 

<t>  <PT 

In  the  sequel,  most  algorithms  will  be  developed  for  the  estimation  problem. 
For  some  algorithms,  however,  the  derivation  is  clearer  in  the  control 
context.  Duality,  ensures  that  a  solution  for  one  application  can  always 
be  adapted  to  the  other  application  using  the  above  mapping;  the  choice 
of  problem  will  be  based  primarily  on  considerations  of  physical  in¬ 
sight  . 

The  previous  chapter  reviewed  matrix  fundamentals  and  basic 
algorithms.  In  this  chapter  these  fundamentals  will  be  used  to  develop 
algorithms  for  solving  the  two  fundamental  applications.  Section  2 
defines  criteria  for  analysis,  briefly  touching  on  numerical  problems. 
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Section  3  describes  an  algorithm.  Eigenvector  Decomposition  (EVD),  which 
exploits  modal  decomposition  along  eigenvectors  to  solve  the  Hamiltonian 
(the  model  of  the  system  and  its  adjoints).  Section  4  elaborates  properties 
of  orthogonal  transformations,  which  are  then  used  in  Section  5  to 
develop  a  class  of  iterative  algorithms;  these  algorithms  are  valuable 
because  of  their  numerical  properties.  Section  6  introduces  another 
perspective  on  estimation  and  control  Riccati  equations  that  evolved  from 
scattering  theory. 

In  Section  7  scattering  theory  is  then  used  to  motivate  another 
class  of  algorithms,  the  square-root  doubling  (SQD)  algorithms.  In 
Section  8  square-root  algorithms  are  developed  purely  within  the 
scattering  theory  framework.  In  the  last  substantive  section.  Section 
9,  various  weaknesses  of  these  algorithms  are  reviewed,  and  proposed 
revisions  are  considered. 


III. 2  Iteration 

The  benchmark,  by  which  all  methods  should  be  compared,  is 
straight  iteration  of  the  Riccati  error  covariance  equations: 


Mi+1  =  ^i  Pi  ‘t’i  +  ri^ri  »  where  p0  =  0 


P.  =  M.  -  M.  HT(R.  +  H.M.hT)'1  H.M. 
1  1  11  1  111  11 


(2.1) 

(2.2) 


and  where  the  filter  gain  is  calculated  as  a  function  of  this  covariance 


T  -1 

K.  =  P.  h!  R. 

i  ill 


(2.3) 
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Several  problems  can  arise  while  trying  to  iterate  equations  (2.1)  and 
(2.2)  to  steady-state.  These  equations  may  converge  inordinately  slowly, 
making  the  calculation  expensive.  The  calculation  may  also  be  inaccurate. 
The  error  covariances  may  become  indefinite  due  to  truncation  errors; 
this  has  an  unfortunate  effect  on  the  feedback  gains,  .  In  a 
related  problem,  ill-conditioning  can  lead  to  inaccurate  convergence, 
or  even  divergence . 

We  will  address  these  latter  questions  when  we  discuss  the  square- 
root  algorithms  in  Section  III. 5.  The  first  issue,  cost  of  computation, 
we  will  discuss  only  in  relative  terms.  In  Table  III.l  we  present 
operations  counts  for  several  of  the  algorithms.  The  first  major  row 
accumulates  initialization  overhead,  the  second  major  row  accumulates 
iteration  overhead,  and  the  third  row,  termination  overhead.  After  a 
relative  accounting  of  the  number  of  iterations  required  to  terminate, 
a  final  column  presents  cost  in  cycles  assuming  computations  for  a 
single-input/single  output  system,  and  for  a  system  with  as  many  inputs 
and  outputs  as  states. 

For  costs  we  have  assumed  the  following 

additions  take  two  cycles 
multiplies  take  three  cycles 
divisions  take  five  cycles 
square  roots  take  forty  cycles. 

These  ratios  should  be  fairly  typical  for  modern  computers--  from  small 
to  large.  We  have  assumed  square  roots  will  be  done  in  software,  using 
hardware  addition,  multiplication,  and  division. 
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We  note  immediately  that  the  computation  time  for  square  roots  has 
a  negligible  impact  on  the  cycle  cost  precisely  because  the  algorithms 


always  require  0(n)  square-root  computations;  for  all  algorithms 
considered,  at  most  4n  square-roots  need  to  be  computed. 

Table  I I I. 2  presents  the  computation  costs  for  the  fundamental 
operations  we  assumed  (matrix  multiply,  Cholesky  decomposition,  etc.) 


III. 3  Eigenvector  Decomposition 

We  begin  by  solving  the  dual  to  the  optimal  filter  problem-- 
the  optimal  quadratic  regulator  problem.  Following  the  approach  of 
Bryson  [1975],  the  minimal  cost  function  is  determined  from  a  two-point 
boundary  value  problem.  From  the  previous  section,  given  the  model 


x 


i+1 


x . 

l 


+  r  u. 

i 


(3.1) 


and  a  quadratic  cost  function 


J 


^  I  x  ♦  A  x. 

i=0  1  1 


(3.2) 


where  A  and  B  are  symmetric;  A  is  positive  definite  and  B  is 
positive  semi-definite.  We  wish  to  find  a  linear  feedback  specification 

u.  =  -C.  x.  (3.3) 

ill 

such  that  J  is  minimized.  Then  N  is  allowed  to  go  to  infinity;  if 
a  steady-state  is  reached  then  will  be  a  constant,  C  . 

The  derivation  begins  by  noting  that  u^  will  affect  only  states 
outside  the  range  of  interest.  We  can  therefore  rewrite  (3.2)  as 


34 


J  =  h  XI  A  x,  +  h  £  xT  A  x.  +  uT  B  u. 
N  N  .  _  1  1  i  i 

1=0 


(3.4) 


We  then  minimize  J  via  the  calculus  of  variations,  by  augmenting  J 
with  an  undetermined  multiplier,  A^  .  Then 

3  -  *1  A  *n  -  "n  *  y  xi'  *  Ho  ■ 

i=l 

where  H .  is  given  by 

H.  =  4  xT  A  x.  +  %  uT  B  u.  +  AT  ,  (<j>x.+r  u.)  Vi,  0<i<N 

xi  11  i  i+i v  i  i  ’  - 

(3.6) 

We  now  consider  differential  changes  in  J  due  to  differential  changes 
in  ih  .  The  appropriate  choice  for  A^  is  then 


3H 

IT  -  XI  ‘  0 

1 


Vi  ,  0<i<N 


(3.7) 


T  T 

V  -  XN  =  0 


This  gives 


N-l  3H. 

dJ  =  Z  ~  du(i)  +  A  dx 
i=0  3ui  00 


(3.8) 


For  an  extremum,  dJ  must  be  zero  for  arbitrary  du^  .  Therefore, 


9H. 

^  =0  Vi  ,  0<i<N 
du.  ’ - 

l 


(3.9) 


Installing  (3.6)  in  (3.9),  we  have 
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T  T 

u  B  +  A  ,  r  =  0 

1  l+l 


u.  =  -  b'1  rT  x. 

i  i+i 


Using  (3.10)  in  (3.1),  and  rewriting  (3.7),  gives 


(3.10) 


x.  =  <j>  x.  -  r  b"1  rT  x.  .  x  =  x(o) 
1  +  1  y  1  1  +  1  o 


(3.11) 


T  T  T 

x!  =  X.  .  d>  +  x:  A 

l  i+i  *  l 


AN  =  XN  A 


the  Euler- Lagrange  difference  equations.  Formulated  in  state  space 
notation,  we  have 


i+1 


-1  T  -T  -IT  -T 

4>+r  b  r  <|)  a  -r  b  r  <j> 


-T 

-4>  A 


<P 


-T 


(3.12) 


assuming  <f>  is  invertible.  The  square  matrix,  called  the  Hamiltonian 
of  the  system,  is  symplectic.  That  is,  for  each  eigenvalue  v^^  there 
exists  an  eigenvalue  v^  with  the  same  multiplicity,  where  v^  equals 


1/v.  . 


The  vector  X.  can  also  be  expressed  as  a  linear  combination  of 


th 


e  x^  .  Calling  this  mapping  P^  ,  (3.12)  becomes 

pi  =  <f>T(piI1+rB~1rT)"1  <f>  +  A 


(3.13) 


where 


Xi  =  P,  x, 


A 
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Vaughn  [1970]  used  equation  (3.13)  and  the  fact  that  H  is 
simplectic  to  extend  Potter's  original  solution  for  the  equivalent 
continuous  system  [1966] .  He  considered  the  array  of  column  vectors 
[TgjTjj]  which  are  the  eigenvectors  of  (3.12)  associated  respectively 
with  the  stable  and  unstable  eigenvalues.  Then,  defining 


to  be  a  partition  of  T  based  on  the  equations  for  x-  and  , 
the  steady-state  solution  for  can  be  written  as: 

P=AUXU~1  .  (3.15) 

The  algorithm  for  calculating  the  steady-state  controller  gains, 

C  ,  is  therefore  straightforward.  After  computing  the  matrix  (3.12), 
the  matrix  of  eigenvectors  is  determined,  using,  for  example,  the  QR 
algorithm.  Given  T^  ,  (assuming  a  complete  set  of  eigenvectors  exist) , 
then  (3.15)  follows  directly. 

We  note  in  passing  that  the  closed  loop  system  is  stable  if 

1)  The  system  is  stabilizable  (i.e.,  if  the  unstable  modes 
are  controllable)  through  the  control  gain  T  ,  and 

2)  The  system  is  detectable  (i.e.,  if  the  unstable  modes  are 
observable)  in  the  cost  function  J. 

Similarly,  for  the  Kalman  filter,  the  closed  loop  system  will  be 
stable  if 

1)  The  system  is  detectable  (i.e.,  if  the  unstable  modes  are 
observable) ,  and 


2)  The  system  is  stabilizable  (i.e.,  if  the  unstable  modes 
are  controllable) . 


The  implementation  of  this  algorithm  is  predicated  upon  a  numer¬ 
ically  well-conditioned  eigenvalue  decomposition  procedure.  As  outlined 
in  Chapter  II,  by  choosing  the  QR  algorithm  we  would  expect  excellent 
performance.  This  has  been  our  experience  (see  Chapter  VI). 

On  the  negative  side,  several  assumptions  made  during  the  deriva¬ 
tion  of  this  method  preclude  the  solution  of  interesting  and  important 
problems.  For  examj  e,  the  state  transition  matrix,  (J>  ,  was  assumed 
invertible.  In  systems  incorporating  pure  delay,  invertibility  may  not 
obtain.  In  addition,  the  control  weighting  matrix,  B  (or  the  measure¬ 
ment  noise  covariance  matrix,  R) ,  was  assumed  non-singular;  some 
observations,  however,  may  be  uncontaminated  by  noise  leading  to  a 
singular  weighting  matrix  (see  Chapter  V). 

At  another  level,  the  existence  of  a  complete  set  of  eigenvectors 
was  assumed.  This  may  not  be  true  if  the  system  has  repeated  closed- 
loop  eigenvalues. 

In  each  of  these  cases  the  Riccati  equation  may  still  have  a  unique, 
positive,  steady-state  solution.  These  issues  will  be  reconsidered  in 
Section  III. 9  (which  proposes  algorithm  revisions)  and  again  in  Chapter 
Five,  under  differentiating  applications. 

III. 4  Square  Roots  and  Orthogonal  Transformations 

Several  facts  about  orthogonal  transformations  will  clarify  the 
results  of  the  subsequent  sections.  These  deal  with  the  implicit  nature 
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of  the  orthogonal  mapping  and  with  an  indirect  mechanism  for  calculating 
inverse  square  roots  via  triangularization . 

The  previous  chapter  extolled  the  many  virtues  of  these  mappings. 
They  provide  an  additional  benefit  when  working  with  the  square  roots 
of  matrices.  If  two  arrays  are  equal,  for  example,  if 

A  =  B 

T/2 

inserted  between  A2  and  A  '  : 

,  =  I 

,  is  only  unique  to  within  an 

structures,  then  there  is  an 


Specifically,  if  B2  is  lower  triangular,  then  any  orthogonal  triangular- 
ization  algorithm  can  be  used  to  map  A  .  We  will  always  use  Householder 
reflections,  for  the  reasons  outlined  earlier. 


aV'2  =  bV/2 


then  any  orthogonal  transform  can  be 


Ah  »T  AT/2  -  B^  B^2 


"T  /  2 

The  matrix  square  root.  A2  or  B  ' 
orthogonal  transformation. 

If  A2  and  B2  have  different 


orthogonal  transformation  such  that 


To  calculate  [I  +  AA^]  ^  or  [I  +  A^A]  * 
an  orthogonal  triangularization  to  the  array 


we  consider  applying 


[i : at]  •  c r  =  [Bio]  ,  b  =  bk 


(4.1) 
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where 


er 


.  eJ  = 


Then,  if 


er  = 


T  T 
1  12 


T  T 
21  2 


(4.2) 


we  will  demonstrate  that 


B  =  T 


-T 

1 


A  =  T21  Tl_1 

[I  ♦  ATA] _1  =  Tx  .  t[ 

[I  +  AA1]'1  =  T,  *  To 


(4.3) 

(4.4) 

(4.5) 

(4.6) 


To  see  this,  multiply  (4.1)  on  the  right  by  O'  .  Then  we  have 


[HA1]  =  [B;0] 


rTT 

1 


12 


,T  i 
21 

,T 


(4.7) 


T 

A  =  B 


B  •  T 

T 


1 

,T 

21 


(4.8) 


Squaring  both  sides  of  (4.1)  gives 


[I  +  ATA]  =  B  •  BT  =  (T^T^)'1 


(4.9) 


40 


Similarly,  using  (4.9) 


[I  *  aat]  -  [I  *  Tn  t;1  t;t  4, 


t2  *  t21  t;1  t;t  tJ  t2]t~' 


Examining 

T 

Cfc1  .  to*' 


= 

O 

= 

0 

>— < 

. 

J 

T  T 
T  T  +T  T 
12*1  ‘2*21 


T  T 

T  T  +T  T 

12  12  2*2 


C  D 


We  have,  from  element  C  of  0  -O'  , 


I  +  AA1 


_Tt  T  -ITT  1 

T,  [T,T,  ♦  I12  Tj  T:  Tl  Tj  T12]T2 


‘2  1  2  2 

"  T2TfT2T2  +  T!2  T12^2 


=  (T2T^)  1  ,  from  element  D  of  fr7*©" 


(For  an  alternate  proof  of  these  results,  see  Bierman  [1976].) 

We  can  generalize  these  results  further.  For  the  case  of 
[R  +  AA1]'1  and  [R  +  ATA]  1  , 


we  have 
[r'5;A]0'  =  [B  >0]  B  =  bx 


(note:  we  have  constructed  the  left  array  with  A  ,  not  A  ) 


t  _  rr 

B  =  R  2T^ 


A  =  R^TjT'j 


T21  =  aTr'T/2t1 


[R  +  AA7]’1  =  R-T/2T1T7R-!s 


[R  +  ATA]_1  =  R_!5T7T7r"T/2 


i 

1 

1 
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We  typically  need  to  work  with  the  elements  of  O'  ,  although  we  do 

not  want  to  calculate  the  elements  of  the  matrix  explicitly.  This  is 

T  -T/2 

easily  done.  For  example,  to  calculate  T7  =  [I  +  AA  ]  , 

construct 


The  calculation  of  the  inverse  square  root  is  therefore  straightforward, 
and  implicit. 

We  will  also  need  to  concatenate  orthogonal  transformations.  For 
this,  if 

AAT  =  I  and  BBT  =  I 

then 

(AB)  •  (AB)T  =  I 


and  defining  C  and  D  such  that 
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III. 5  Square  Root  Algorithms 

The  earliest  class  of  square-root  algorithms  recursively  computed 
the  square  root  of  the  Riccati  difference  equation.  Many  have  dealt 
with  this  subject,  including  recently  Golub  [1965],  Schmidt  [1970], 
Kaminski  and  Bryson  [1971,1972],  Morf  and  Kailath  [1975],  and  Bierman 
[1977]. 

The  properties  of  orthogonal  transformations  presented  in  the  last 
section  can  be  used  to  derive  these  algorithms  in  a  purely  algebraic 
fashion.  In  so  doing,  interesting  questions  arise,  and  the  stage  will 
be  set  for  the  scattering  derivation  to  follow. 

Beginning  again  with  the  Riccati  equation  for  the  estimation 
problem 


M.  .  =  <h  -  P .  r 
l+l  ri  iri 


T  T 

1  +  r.q.r. 

ixi  i 


(5.1) 


P.  =  M.  -  M.HT(R.+H.M.H1)’1H.M. 
1  1  111111  11 


(5.2) 


(5.2)  can  be  rewritten  to  eliminate  the  subtraction 


P.  +  M -hT(R. +H.M.hT)  1  H.M.  =  M. 

l  il  l  ill  ii  l 


(5.3) 


The  quantities  of  interest  are  still  and  P^  .  We  also  note 

that  (5.1),  (5.2),  and  (5.3)  are  all  symmetric  equations,  in  the  sense 

T 

that  each  term  can  be  written  as  AA 

It  is  clear  how  to  separate  (5.1): 


u 1 


[<j>.P.-zT.Q.2]a/1  =  [M.2  .*0] 

Lri  l  ixiJ  1  1  l+l i  1 


(5.4) 


Taking  transposes,  and  squaring  both  sides  produces  equation  (5.1). 
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To  calculate  (5.3)  as  a  square  root,  we  need 
M  *  on  the  left 

P2  and  MH^(R+HMH^)  "^2  on  the  right. 

T  - 1 

We  begin  by  computing  the  inverse,  (R+HMH  )  .  From  the  previous 

section,  we  have 

[R*2;  HM^]&2  =  [B  [0]  B  =  rS\T  =  [N,.  (5.5) 

T 

where  ©2  implicitly  computes  the  inverse  of  R  +  H  MH  .  Examining 
the  elements  of  0"9 : 

T1  =  RT/2[R  +  H  M  HT]-T/2 
t21  =  M^H(R+HMHT)'T/2 

T2  =  R^R  +  HTMH]-T/2  (5.6) 


We  note  that  is  precisely  the  quantity  we  need,  for 


ft2 

IIM2 

B  0 

0 

\t2 

II 

b" 

MHT(R+1MHT)'T/2  c_ 

(5.7) 


k 

and  C  must  be  P  ,  as  desired. 

We  now  have  two  update  equations,  which  we  would  like  to  combine 
into  one  (with  a  single  orthogonal  mapping) . 

In  the  next  section  we  will  show  how  to  concatenate  mappings.  In 
this  section  we  will,  instead,  quickly  rederive  the  formulas;  but  this 
time  for  a  single  update.  Beginning  by  combining  equations  (5.1)  and 
(5.2): 
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Mi+1  =  ^Mi^T  +  FQrT  '  4>MiHT(R+HMiHT)'1HM.(|)T 


remove  the  minus  sign 


Mi+1  +  =  <{>Mict)T  +  rQrT 


construct  the  inverse  transformation 


[R^jHM?]®^  [b|o] 


and  note  the  result  of  this  mapping 


(5.8) 


(5.9) 


(5.10) 


T  =  Nf?HT(R+HM.HT)‘T/2 

21  l  l 


R^  m?  o'  =  B  0 

1 

I 

0  (f)M^  ^M-H^R+HM-H7)'172  C 


(5.11) 


The  two  sides  are  nearly  complete.  Adding  the  final  term,  we  have 


HM?  0  1  Or'  =  T  B 


0  0 


(5.12) 


0  FQ^ 


<JM.HT(R+HMiHT)"T/2  C  0 


where  again,  C  must  equal  M?+1  (which  f  ’lows  immediately  from 
squaring  both  sides) . 

It  may  seem  surprising  that  the  three  step  procedure  outlined 
initially  coalesces  so  neatly  into  a  uniform,  single  step  procedure. 

This  interrelationship  of  the  individual  component  equations  is  a  product 
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of  the  intrinsic  structure  of  the  problem.  We  shall  examine  this 
structure,  in  terms  of  scattering  layers,  shortly. 

Comments  on  the  Strengths  and  Weaknesses  of  Standard  Square  Root  Algorithms 
As  a  positive  note,  the  square  root  algorithms  provide  improved 
accuracy  in  the  final  result.  This  improvement  arises  for  two  reasons. 
First,  since  the  iterated  variable  is  the  square  root  of  the  error 
covariance,  this  variable,  when  squared,  must  be  positive  semidefinite ; 
the  calculated  error  covariance  can  never  evolve  negative  eigenvalues. 

In  contrast,  no  such  assurances  can  be  made  for  normal  iteration. 

The  conditioning  can  be  examined  analytically  by  introducing  the 
spectral  condition  number,  *  ,  derived  in  the  previous  chapter. 

Given  the  condition  number  for  a  matrix  P  ,  it  follows  directly  that 
the  condition  number  for  the  square  root  of  P  is  the  square  root 
of  the  condition  number  for  P  : 

P  =  LLT  (5.13) 

k(L)  =  k(P)^  ,  since  the  singular  values  of  L  are 

the  square  root  of  the  singular  values 
of  P  . 

Recall  that  spectral  conditioning  was  defined  from  the  expression 


Ax 

Ab 

X 

b 

If  the  computer  calculates  in  floating  point  with  2n  bits  of 


Ab 


,l-2n 


(5.15) 


accuracy,  then 
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Thus,  for  square  root  algorithms,  approximately  half  the  word  length 

(n  bits  of  accuracy),  should  be  sufficient  to  provide  the  same  error, 
Ax  I 


As  Kaminski  [1971]  points  out,  this  analysis  applies  to  the  linear 
case,  but  the  Riccati  equation  is  non-linear.  For  example,  in  the  case 
of  two  equations  updated  separately  (e.g.,  (5.4)  and  (5 . 7) ), Kaminski 
found  that  the  appropriate  error  equations  were 


Ax 

_  iff  I  ^ 

AL 

_  4.  _ 

Ay 

X 

1  L 

T 

y 

+  *  CD  * 


1 


r 

y-r 


AL 


As  long  as  the  residual  is  small  compared  to  the  update,  the  term 

associated  with  the  squared  spectral  condition  number  can  be  ignored; 

this  is  usually  the  case  when  the  number  of  measurements  are  much 

fewer  than  the  order  of  the  system.  However,  even  if  this  is  not  the 
2 

case  and  the  k  term  dominates,  the  square  root  filter  is  never 
more  ill-conditioned  than  the  normal  iterative  method  (Kaminski 
[1971]). 

For  a  final  note,  we  turn  to  Table  3.1  to  point  out  that  the  square- 
root  implementation  typically  requires  more  computation.  Variation  within 
the  classes  of  algorithms  is  large,  depending  upon  the  particular  problem 
being  solved  and  the  strategy  adopted  (for  example,  choosing  Givens, 
Householder,  or  square  root  free  implementations;  see,  for  example, 

Bierman  [1977]).  The  square  root  algorithms,  however,  often  requires 
up  to  40%  more  computation  than  the  normal  iterative  algorithms  (see 
Table  3.1)  . 
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Faster  Algorithms 

If  the  model  parameters  are  time  invariant,  Morf  et  al .  [1975] 
have  shown  that  a  considerable  computational  savings  can  be  realized  by 
working  with  the  square  roots  of  AP^  ,  instead  of  P^  .  Assuming 
known  initial  conditions,  or  if  we  assume  they  are  zero  (if  the  steady- 
state  error  covariance  is  the  objective)  then, 


A  P.  =  P.  ,  -  P. 
1  l+l  l 


is  non-negative  definite,  and  of  non-increasing  rank.  Therefore,  as 
explained  earlier,  we  can  factor  A  P^  as 


A  P.  =  L.  l: 
i  ii 


where  L.  is  of  size  nxa  ,  where  a  is  the  rank  of  A  P.  .  Since 
i  i 


4  po  *  P1  -  *0  ■  rQr 

a  must  be  at  most  the  number  of  inputs  to  the  system. 
Using  the  fact  that 


P.  =  P.  +  l.l: 
l+l  l  ii 


we  can  derive  the  square  root  forms  as 

pa  p 


p 

HL.  . 

l-  1 

(R^  0 

l 

n 

K 

<f>L.  , 

r  = 

K. 

L. 

i-1 

1-1 

l 

l 
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1 

where 


=  R 


K0  =  0 


J<  J, 

Lo  =  r^' 


Comments  on  Strengths  and  Weaknesses 

The  fast  algorithms  are  expected  to  have  error  properties  similar 
to  those  outlined  for  the  original  square  root  algorithms;  this 
conjecture  has  yet  to  be  demonstrated.  They  are,  however,  several  times 
faster  than  other  straight  iteration  algorithms,  as  can  be  seen  from 
Table  3.1. 


III. 6  Scattering  Theory 

Scattering  theory  describes  the  reflection  and  transmission 
characteristics  of  a  layered  medium  as  particles,  waves,  etc.  t  pass 
through  that  medium.  Linear  least  squares  estimation  and  control 
describes  the  propagation  of  states  and  error  covariances,  etc.,  as 
they  pass  through  time.  Both  give  rise  to  Riccati  and  related  equations. 
As  with  the  estimation-control  duality  introduced  in  Section  III.l,  a 
correspondence  can  be  made  between  Riccati  equations  arising  from 
different  contexts.  Identifying  similar  terms  found  in  different 
equations  provides  a  mathematical  isomorphism  that  can  then  lead  to 
new  insights. 

In  this  dissertation  we  will  be  working  with  a  particular  theory 
of  scattering  originally  developed  by  Redheffer  in  1950  ([1950],  ...  , 
[1962]).  Redheffer  set  out  the  solution  to  the  scattering  problem  in 


— w~'  ' 
- - -  ^ 
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equations  directly  related  to  the  Riccati-type  formulae.  When  Ljung, 
Kailath,  and  Friedlander  [Ljung,  1976],  [Friedlander,  1976]  explored 
the  connection  with  linear  estimation,  several  key  concepts  were  introduced. 
Redheffer's  work  could  be  interpreted  as  associating  time  with  layers  in 
the  estimation  context,  and  it  therefore  included  time-varying  parameter 
models . 

Another  approach  is  taken  in  the  study  of  geophysics,  which  considers 
the  scattering  of  waves  in  layered  media.  This  scattering  corresponds 
to  constant  parameter,  time  invariant  models,  and  leads  to  models  where 
layers  correspond  to  different  system  orders  or  state  components.  Early 
published  work  was  done  by  Wiggins  and  Robinson  [1965] .  See  also 
Claerbout  [1976], 

The  scattering  of  waves  provides  a  conceptual  structure  for 
visualizing  updates  of  estimates,  states,  adjoint  variables,  or 
covariances  via  the  Riccati  equation.  Updates  in  the  scattering  context 
are  effectively  generalized  to  include  the  combination  of  any  two 
adjacent  layers  into  an  equivalent  single  layer.  In  the  estimation 
context,  these  layers  can  correspond  to  two  arbitrary,  adjacent  blocks 
of  time.  Alternatively,  they  can  correspond  to  different  components  of 
a  state-vector,  or  to  operations  arising  from  [feedback]  loop  removal. 

The  operator  which  describes  this  union,  the  star-product,  leads 
to  a  powerful  set  of  matrix  manipulations.  These  manipulations  are 
often  more  convenient  to  use  because  of  their  immediate  connection  with 
the  estimation  or  control  problem  being  considered. 
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BASIC  OPERATIONS 

We  define  a  canonical  scattering  layer  as  in  Figure  6.1.  A  signal 
from  the  left  will  be  transmitted  (with  amplitude  T^)  and  reflected 
(with  amplitude  R^) .  Similarly,  for  the  signal  entering  from  the 
right.  The  signal  exiting  from  the  left  is  composed  of  a  weighted  sum 
of  both  entering  signals.  Schematically,  we  represent  this  layer  as 
a  matrix,  with  the  transmission  coefficients  on  the  diagonal: 


Looking  at  Figure  6.2,  it  is  clear  how  to  concatenate  two  layers; 
the  concatenation  would  be  straightforward,  except  for  the  loop,  A  . 
Using  a  simple  application  of  Mason's  Rule  for  networks,  the  result  of 
a  star-product  can  be  easily  "read  out"  by  following  the  arrows  along 
each  path.  This  procedure  gives 


'al 

pl' 

★ 

a2 

p2' 

'a2(I-p1r2)_1a1  P2+a2P1(I-r2P1)'1a2" 

_rl 

al. 

1 

H 

K) 

a2 

rl+alr2(-I'Plr2-)"lal 

(6.1) 

as  the  definition  of  the  star-product.  Note  that  the  terms  (I-p^^)  1 
and  (I-^Pj)  *  arise  because  of  the  loop  A. 

In  Figure  6.3,  we  examine  the  relationship  between  the  scattering 
layer  and  the  state  transition  matrix.  Schematically,  to  convert  from 
one  to  the  other,  we  need  to  reverse  the  flow  along  one  of  the  transmission 


Domain  Translation 


paths.  As  with  the  definition  of  star-product ,  the  relationship  is 
algebraic: 


r*ar 

i+i 

A 

B 

M1}1 

i 

xp3 

*1+1 

C 

D 

(2) 

x: 

l 

where  A  and  D  are  transmission 
coefficients,  and 

B  and  C  are  reflection  coefficients 


(6.2) 


Then,  assuming  the  determinant  of  D  does  not  equal  zero,  we  have 


(«' 
i  +  l 

A- BD_1C 

BD"1 

(2) 

i 

-D_1C 

d"1 

t — \  ’ — < 

CN  + 
^  -H 

X 

where  D  is  denoted  as  the  pivot  element .  We  could  also  pivot  about 
A,  assuming  A  is  non-singular,  so  that 


a"1  -a_1b 

rx(in 

1+1 

CA~ 1  D-CA_1B 

(2) 

X. 

1 

(6.4) 


This  mapping,  called  an  exchange  step,  converts  a  transition  matrix  to 
a  scattering  layer.  The  same  mapping  converts  a  scattering  layer  back 
to  the  appropriate  transition  matrix. 


A  simpler  interpretation  obtains  by  pictorially  examining  the 
exchange  step  (see  Verghese  [1978]).  Beginning  with  a  network  of  the 


the  transmission  flow  can  be  reversed  by  inverting  the  gain,  and  by 
changing  the  sign  of  flows  into  the  path  (see,  for  example.  Mason  and 
Zimmerman  [I960]), 


t 


Applying  this  procedure  to  the  scattering  layer  converts 


into  the  transition  matrix 


PROPERTIES  OF  SCATTERING  LAYERS 


It  will  be  useful  in  the  sequel  to  choose  a  notation  that 

differentiates  the  scattering  layers  and  transition  matrices  more 

* 

explicitly.  We  define  A  to  be  a  scattering  layer,  A  (unadorned) 

E 

to  be  a  transition  matrix,  and  A  to  be  the  mapped  equivalent  of 
?  *E 

A-  ,  i.e.,  A  =  A  .  This  matrix  identity,  I  ,  remains  the  same  in 
both  domains: 


PI)  I  =  I* 

We  will  also  have  need  for  a  modified  identity,  the  J  matrix, 
which  we  define  as 


P2) 


J  *  A  =  J  •  A 


=  J 


J  can  be  considered  as  a  particular  choice  of  signature  matrix  (see 
Chapter  II). 

From  Figure  6.3,  we  can  see  an  additional  implication  of  the 
exchange  step  mapping,  for 

ic  k  k 

P3)  A  *B  =  C  then  A  *B  =  C  ,  assuming 
we  can  pivot  each  of  the  arrays. 

Similarly 

P4)  If  A  *B  =  C  then  A*B  =  C 
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Inverse  transition  matrices  have  a  particularly  useful  property  (see 
Friedlander  [August,  1976]). 

PS)  If  A* A  *  =  I  then  A  *  A_1=  I 

From  these  properties,  we  can  also  derive  many  of  the  important 
orthogonal  properties 

P6)  If  0'*(Tr  =  I  then  e**(&T)*  =  I  (from  P3) 

P7)  If  Or‘0'T  =  I  then  ff*CrT  =  I  (from  P5) 

A  slightly  more  elaborate  definition  of  orthogonality  is  required  in 

the  scattering  domain  (for  example,  see  Vieira  [1977]).  Because  the 

exchange  step  complements  one  of  the  coupling  terms  during  the  mapping, 
* 

we  find  that  S'  is  J-orthogonal 

P8)  If  e T-Q7  =  I  then  S*  JS*T  =  J 
which  we  can  generalize  by  repeated  application  of  property  (P2) : 

P9)  If  S'*  S’1  =  I  then  S**J*S*T  =  J 
■By  application  of  property  (P6),  we  also  have 
P10)  If  er-e1  =  I  then  S7*  = 

Triangular  Layers 

We  define  a  lower  triangular  layer  to  be  a  layer  of  the  form 


where  A  and  D  are  lower  triangular,  and  C  is  full.  We  define  an 
upper  triangular  layer  similarly: 


U*  = 


0's 


’\d 

0  I  0  s  ^ 

I 


Property  Pll) 

The  exchanged  version  of  a  triangular  layer  is  a  triangular  matrix. 
This  follows  from  an  exchange  step  (pivoting  about  A)  applied  to  L* 
or  U* 


L  = 

\  o  ; 

a"}'^ _ o_ 

IT  = 

_i< 

A  -  , 

0  v  1  _A-1B 

CA-*  !  ' n  ® 

Urt  t  D  S  ^ 

n  rN-D" 

0  !  0  *  N 

L  1  u 

L  and  U  are  triangular  because  the  inverse  of  an  upper  (lower) 
triangular  matrix  is  upper  (lower)  triangular,  and  because  D  remains 
unchanged. 

Property  P12) 

There  exists  a  J-orthogonal  layer,  &j>  that  maps  any  layer  into  an 
upper  or  lower  triangular  layer  (i.e.,  removes  reflections  from  one  side) 
To  show  the  existence  of  an  (f*,  such  that 

*  *  * 

A  *®J  =  T 


we  choose  &  such  that 
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A  *  O'  =  T 


then  O' 


Property  PI  3) 

If  we  lower  triangularize  a  layer  in  the  scattering  domain 


A  *  0  =  L 


*E 

then  the  equivalent  transformation.  O'  ,  applied  in  the  transition 
matrix  domain  also  yields  a  lower  triangular  matrix. 


F.igenlayers  -  Property  P14) 

From  Wilkinson  [1965],  if  the  eigenvalues  of  A  are  distinct, 
then  there  exists  a  similarity  transform  such  that 

A  =  X *diag(X .. )  *X  *  (6.5) 

where  the  X^  are  the  eigenvalues  of  A,  and  the  matrix  X  consists 
of  the  right  eigenvectors  of  A;  the  i  column  of  X  corresponds 
to  the  i**1  eigenvalue. 

It  then  follows  that  in  the  scattering  domain,  any  layer  can  be 
decomposed  into  an  eigenlayer,  a  diagonal  layer,  and  an  inverse  eigen- 
layer  _ 


*  * 
A  =  X 


,-l 


* 


0  X 


which  has  the  important  effect  of  introducing  a  layer  without  any 
reflection  coefficients.  (Note  that  zero  eigenvalues  can  be  interpreted 
as  "transmission  zeroes."  ) 


tgfr-fitp 
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In  the  case  of  repeated  eigenvalues  property  (P14)  can  be  gener¬ 
alized  using  Jordan  (block)  form,  thus 


A  =  X 


X  ,  where  J. 


X.  1  0 

l 


and,  in  the  scattering  domain  (assuming  appropriate  partitioning) 


1  0_) 

*  ^  i  - 1  * 

A*  =  X  *  --’-i—  *  X 

0  |  "J 


where  the  reflection  coefficients  of  the  eigenvalue  layer  are  zero. 


Summary  of  Scattering  Operation  Properties 
PI)  I  =  I* 


P2)  J  =  J* 
J-1  =  J 


J’A  =  J  *  A 


*  *  * 


P3)  A*B  =  C  =>  A  *B  =  C 


*  *  * 


P4)  A  *B  =  C  =>  A.B  =  C 
PS)  A*A_1  =  I  =>  A*A_1  =  r 
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If  =  I  ,  then 

*  T  * 

P6)  er  *(e  )  =  i 

P7)  0*&T  =  I 
P8)  0*  • J*0*T  =  J 

*  *t 

P9)  er  *j *0  =  j 

T*  *T 

PIO)  0  =  J*0  *J 


Triangular  Layers 
★ 

P 1 1 )  If  T  is  an  upper  (lower)  triangular  layer,  then  T  is 
an  upper  (lower)  triangular  matrix. 

P12)  *ffj  =  T  ,  A  full,  T  triangular,  upper  or  lower. 

P13)  A  *  (fj  =  T  =>  A *0  =  T,  T  triangular. 

Eigenlayers 

P14)  If  A  =  X  diag(A^)X  *,  where  X  are  the  right  eigenvectors 

of  A,  then 

★  * 

A  =  X  * 


A.  0 
0  'A 


,-r 


- 
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EXAMPLES 


To  illustrate  the  utility  of  the  scattering  context,  several 
examples  will  be  developed  in  scattering  terms.  The  first  example, 
the  Hamiltonian  system,  was  presented  in  Section  III. 3.  The  basic 
equations  (3.1)  and  (3.7)  (already  in  the  scattering  domain) 

Vi  =  *V  rB'lrTxi+i  »  x0  given 

A.  =  <^A.  ,  +  A  x.  ,  A  =  xTA  (6.5) 

x  l+l  inn 

are  presented  in  Figure  6.4  as  a  scattering  layer.  (Since  this  is  a 

control  problem,  time  through  the  layer  is  reversed.)  We  can  now 

convert  from  the  scattering  domain  back  to  the  transition  domain,  pivot- 
T 

ing  about  <j>  ,  producing  Figure  6.5,  and  yielding  equations  (3.12). 


M* 


(j)+rB~1r1VTA  -rB~1rT<i>  T 

_-<rTA  <Tr 


M 

(6.6) 


where  X^,  A  ,  and  A^  are  the  unstable  eigen  -elements .  Applying  an 

* 

exchange  step  gives  M 
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X-1 

-X_1x 

rx-1 

0 

U 

U 

s 

* 

u 

A  X-1 

A  -A  X_1X 

0 

X 

_  u  ii 

s 

u  u  s 

_ 

s 

(6.8) 


The  sequence  of  layers  Mq*M^*M2*. . . *M^  appearing  in  Figure  6.6a  can 


^  ^  'fc 

therefore  be  reconfigured  as  in  Figure  6.6b,  i.e.,  (E^*Dj*Ej 

_  1  *  *  *  i  * 

E  )*. . .*(E  *D  *E  ) . 


"2  '  ’  ’  •  -  n  n  n 

Two  simplifications  immediately  obtain.  First,  since  the  Hamil- 


-1  _i *  * 

toman  is  symplectic,  X^  =  X^ .  Second,  since  E  *E  =  I  from 


property  (P4) ,  many  adjacent  layers  cancel.  This  leads  to  the  very 


*  *n  -1* 

simple  structure  of  Figure  6.6c,  i.e.,  E  *D  *E 


Since  Xg  are  the  stable  roots  of  the  system  and  are  therefore 


less  than  unity  in  magnitude,  as  n  goes  to  infinity  (Xs)n  =  C^u) 


-n 


goes  to  zero;  if  we  consider  an  infinite  number  of  layers,  there  is 
no  transmission  through  the  network  (i.e.,  the  two  boundary  eigenlayers 


are  decoupled) .  The  reflection  coefficient  relating  the  state  to  the 


adjoint  variable  X.  is  therefore 

l 


,-l 


X.  .  =  A  X  x.  ,  =  P  x.  , 

i*l  u  u  1+1  1+1 


as  expected. 

This  approach  introduces  an  interesting  alternate  solution  to 
the  normal  eigenvector  approach.  Note  that  constructing  the  Hamiltonian 
in  the  scattering  domain  does  not  require  inverting  the  state  transition 
matrix.  Therefore,  if  an  eigenvalue  decomposition  routine  could  be 
written  to  calculate  the  scattering  domain  eigenvectors  directly,  the 
pivot  step  (6.6)  could  be  avoided.  This  has  not  yet  been  fully  pursued. 
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Another  estimation  example,  which  is  important  in  the  sequel,  intro¬ 
duces  the  propagation  of  error  covariance  matrices  in  the  scattering 
framework.  From  Friedlander  [1976],  we  note  that  we  can  update  the 
forward  error  covariance,  and  the  inverse  smoothed  error  covariance,  by 
constructing  a  layer  as  shown  in  Figure  6.7  and  cascading  it  with  the 
layer  shown  in  Figure  6.8,  The  relationship  between  the  scattering 
matrices--  the  accumulated  covariance  of  Figure  6.7  and  the  update  of 
Figure  6.8  --  and  the  transition  matrix  domain  can  be  clarified  by 
examining  the  normal  update  equations  for  the  requisite  quantities. 

For  example,  for  the  estimate  error  covariance,  we  have: 

P(t+l|t)  =  4>P(t|t-l)«J>'+rQrT-«()P(t|t-l)HT(R+HP(t|t-l)HT)-1HP(t  |t-l)<f>T 

(6.9) 

From  the  star  product  definition.  Equation  (6.1),  we  see  this  must  be 
converted  to  the  form 

pp  =  p2  +  a2  pi  [I-r2p1l1°t2  (6.10) 

which  follows  immediately,  after  applying  the  matrix  inversion  lemma 

(A+BCD)'1  =  A"1  -  A"1B(DA-1B+C”1)-1DA~1  (6.11) 

to  give 

P(t  +  l|t)  =  TQrT  +  4>P(t  j  t  - 1 )  [I  +  HTR~  1HP(t  1 1-1)  ] -1<J)T  (6.12) 

Similarly  for  <p  ,  the  closed  loop  Kalman-filter  transition  matrix, 
we  can  write 

4>0(t+l  ,0)  =  <|>[I  +  P(t  1 1 - 1 ) HTR~ 1 H] <f>o ( t  ,0) 


(6.13) 
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which  is  exactly  analogous  to  the  star  product  quantity  of  (6.1) 


a0  =  a2  tI_plr2^  1&1 


(6.14) 


And  finally,  we  can  also  present  the  equation  for  W.  We  shall  be 
discussing  the  significance  of  W  in  the  next  section.  W  is  basically 
the  smoothed  error  covariance  ^  ,  which  we  can  write  as 


W(t+l|t)  =  W(t  1 1-1)  +  <j)^(t,0)HTR"1H[I  +  P(t|t-l)HTR"1H]“14)o(t,0) 


(6.15) 


again,  analogous  to  the  star  product  quantity 


rQ  =  rj  +  ax  r2  (I-p^)-^  (6.16) 

The  connection  between  the  equations  (6.9)  through  (6.16)  and  the 
scattering  diagrams.  Figures  6.7  and  6.8,  follow  immediately. 

We  shall  return  to  examine  this  example  in  close  detail  because 
of  one  striking  observation.  As  Friedlander  et  al.,  noted  [1976],  if 
we  combine  two  "accumulation"  layers,  instead  of  an  "accumulation"  and 
an  "update"  layer,  we  generate  a  very  interesting  sequence.  Beginning 
with  covariances  at  time  t=0  we  generate  the  covariances  for  t=l, 
t=2,  t=4 ,  t=8,  ...;  at  each  iteration,  i  ,  we  generate 


P(21)  ,  W(2L)  ,  and  4>0(21D 

Given  a  constant  parameter  system  and  the  objective  of  calculating  the 
steady-state  error  covariances,  this  approach  would  then  converge 
exponentially  compared  to  the  previous  iterative  algorithms.  This  is 
the  subject  of  our  next  few  sections. 
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III. 7  Square  Root  Doubling 

We  return  again  to  another  class  of  algorithms  for  solving  the 
steady  state  Riccati  equation.  In  this  case,  we  partition  an  interval 
of  data  into  subintervals.  We  then  make  estimates  of  the  states  within 
each  interval  based  only  on  data  from  the  subinterval.  Then,  global 
calculations  are  used  to  combine  information  from  subintervals  to 
obtain  optimal  estimates  at  the  subinterval  endpoints;  in  this  fashion 
subintervals  can  be  "recombined"  to  yield  the  answer  over  the  entire 
interval . 

Besides  the  work  by  Morf,  Dobbins,  Friedlander,  and  Kailath 
[1978],  various  others  have  expanded  on  this  idea.  Following  Womble 
and  Potter  [1975]  and  Lainiotis  [1976],  Bierman  and  Sidhu  [1977]  focused 
on  doubling  formulas  for  the  continuous  case.  We  shall  give  an 
algebraic  derivation  for  the  discrete  case,  using  the  scattering  theory 
of  the  last  section,  and  then  develop  a  pure  scattering  framework  for 
rapid  derivation  of  equations. 

Schematically,  we  can  picture  the  interval  of  data  in  Fig.  7.1 
being  partitioned  as  in  Fig.  7.2.  Focusing  on  the  interval  from  j  to 
k,  Morf  et.  al.  [1978]  noted  that  the  Markov  property  of  the  {x} 
process  implies  that  for  the  purpose  of  state  estimation  outside  the 
observation  interval,  the  interval  can  be  summarized  in  terms  of  four 
quantities  (two  "boundary"  state  estimates  and  their  covariances) : 


Figure  3.7.1  ORIGINAL  INTERVAL 


Beginning  with  the  complete  interval. 


+ 


+ 


Solve  over  each  interval  for  the  endpoint  state  information  describing 
that  interval, 

| - 11 - HI - II - II - 1  •  I - II - II - II - II - 1 


Combine  state  information  at  every  other  adjacent  endpoint  to  form 
doubled  intervals  twice  as  long, 


HI - II -  •  •  - II - W 


Continue  combining  intervals,  until 


h 


w 


the  salient  characteristics  for  the  entire  interval,  from  t  =  0  to 
t  =  N  ,  are  determined 


Figure  3.7.2  INTERVAL  SEGMENTATION 
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1)  Xk||j  k  ^  >  the  predicted  estimate* 

2 j  Pr  i  j- j  ^  i ]  ,  the  predicted  error  covariance 

A 

3)  x  .  ,  r  ,  ,  the  smoothed  estimate 

4)  P . i r ■  ,  ,  the  smoothed  error  covariance 

3lL3,k-l] 


and  one  relation,  describing  the  closed— loop  state  transition  matrix 
<t>  (j  ,k-l) ;  this  interval  transition  matrix  relates  the  state  information 
at  one  endpoint  to  the  state  information  at  the  other  endpoint  (see 
Figure  7.3). 

The  scattering  domain  will  be  used  to  quickly  derive  the  algorithm, 
but  considerable  insight  can  be  gained  by  examining  the  theoretical 
basis  for  the  algorithm.  If  we  begin  by  assuming  the  value  of  the 
state  at  the  initial  endpoint,  x  ,  is  known,  then  we  have 


Xk|x.,[j,k-l]  =  4>0(j,k-l)x.  +  xk|x.  =  0,  [j  ,k-l]  (7‘1} 


where  <*  is  the  closed-loo^  state  transition  matrix  for  the  Kalman 
o 

filter,  assuming  the  initial  error  covariance  is  zero.  That  is,  the 
best  estimate  of  x  at  the  endpoint  is  the  sum  of  the  best  estimate 
assuming  zero  initial  conditions  plus  the  known  initial  condition, 
propagated  (via  <p  )  over  the  interval. 

Assuming  we  don't  know  x_.  ,  the  best  we  can  do  is  to  use  the  best 


estimate  of  x.  ,  x  -  i  r  •  i  ,i  ,  to  give 
3  3  I U  »k_l] 


N  A  A 

Ck|[j,k-1]  ~  <Vj’k‘1)xj|[j,k-l]  +  xk  |  Xj  =0 ,  [j  ,k-l] 


(7.2) 


Recall  that  by  x^|jj  ^  we  mean  the  estimate  x  at  time  k  given 
observations  at  times  j  through  k-1. 


Similarly,  we  can  write  the  associated  error  covariance  as 


Pk|[j,k-1]  "  *0(j*k'1)Pk|[j,k-l]<|,o(j’k-1)  +  Pk|x.,[j,k-1] 

(7.3) 

/*•. 

If  we  write  the  other  interval  equations--  for  x r.  ,  ,,  and 

Pj  ]  j-j  ^  jj--  then  we  would  be  ready  to  consider  combining  \|[j  ^  i] 

A  A 

with  xj  |  j-j,  ^ _^-j  to  produce  |  [^  n  j]  •  Morf,  et  al  . ,  develops 
this  formulation. 

Another  approach  is  to  recognize  that  these  equations  and  quantities 
are  precisely  those  terms  we  were  treating  in  the  last  example  of  the 
previous  section.  Making  the  association  with  Figure  6.8,  we  can 
construct  Figure  7.4  and  write  down  the  equations  immediately.  Assuming 
time  invariant  coefficients,  we  have: 


<C0U)  P(i) 

^o(i) 

P(i) 

<j>o[I-PW]'\ 

p+^pfi-wp]'1^ 

W(i)  (i) 

* 

W(i) 

T 

V«J 

w+ch^iv  [  i -pv\T] _  14>0 

T  -  -IT 

<t>;[i-wp]  r0 

(7.4) 

Making  a  change  of  variables,  of  W  =  -W  ,  we  get 

(a)  P ( 2 i )  =  P(i)  +  t}>o(i)P(i)  [I+W(i)P(i)]"1<j)^(i)  (7.5) 

(b)  W C 2 i )  =  W(i)  +  <J>^(i)W(i)  [I+P(i)lV(i)]-1(f>o(i) 

(c)  4>0(2i)  =  4>o(i)  [I  +  P(i)W(i)]'1<))o(i)  . 

Recall  that  the  objective  is  to  derive  a  square  root  algorithm  for 
propagating  these  equations.  Initially,  we  will  guess  that  (7.5c)  can¬ 
not  be  reduced  to  a  square-root  form  unless  <j>Q  is  symmetric.  [In 


mmrnmm 


the  next  section  we  will  gain  some  insight  into  why  this  is  the 
case . ] 

The  other  two  equations,  (7.5a)  and  (7.5b)  are  close  to  the  form 
we  could  readily  recognize  and  exploit;  we  need 


A(2i)  =  A(i)  + 


[A^-jo]  =  [A?jBT1]  •  er 


where  A2  is  lower  triangular,  and  (T  is  an  orthogonal  matrix, 
constructed  so  as  to  triangul arize  the  matrix  [A^  -,BT^ ]  . 

In  the  derivation  that  follows,  if  the  subscript  is  missing  on  a 
time-varying  quantity,  "i"  is  assumed.  Beginning  with  the  equation  for 


P(2i)  from  (7.5), 


P(2i)  = 


P  +  <J)  P[I  +  WP]_1(J)T  (7.8) 

o  1  o 

p  +  $  P^PT,/2[I  +  WP.  ]~1P~T/'2PT/'2<pT  ,  where  P'2PT//“=P 
O  1  o 

p .  *  P^I .  fVW'WrV'V 


=  P  +  <p  P^[I  +  XxW/y  ,  where  X=PT,/2W'2 
o  1  o 


and  similarly,  for  W(2i) 


>i)  =  IV  +  *W'2[I  +  PW%T/'2]~1<t 


=  w  +  *T*\l  +  wt/2pV/2w^]-1wt/2^ 

Yo  L  J  o 

=  w  +  <t>Tw^[i  +  x\]-W24 

To  1  o 


From  Section  III. 4,  we  compute  [I  +  XX  ]  by  choosing  9^  so  that 


0^  is  orthogonal. 


»  x/T ! 


and  so  that  it  triangularizes  [I;X  ] 


[I:XT]»/1  =  [Rio]  R  = 


Then 


[I  +  XTX]  = 


[I  +  XXT]  = 


(7.10) 


R  =  T, 


x  =  t21t: 


Rewriting  (7.8),  we  have 


P(2i)  =  P  +  (J>oP^T2T2PT/2({)^ 


(7.11) 


We  now  choose  a  second  orthogonal  transform,  <%  ,  so  that 


(7.12) 


As  long  as  P  has  at  most  n  columns,  any  orthogonal  mapping,  , 
■will  suffice.  (Triangularizing  mappings,  however,  reduce  later 
computations . ) 

Recalling  the  properties  of  orthogonal  transformations  we  outlined 


in  Section  III. 4,  we  can  now  write 
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- 

r 

- 

- 

- 
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01 

0 

• 

I 
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0 
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0 

^2 

-T 

T1 

0 

•1 

• 

I 

n 

0 

'•4 

<f>  P  ■  , 

o  l  21 

<h  p  S 

0  1 

P2 

p? 

1 

0 

°2_ 

. 

a.  -T 

T1 

0 

0 

— 

M1 

<j>  P.T01 
Yo  l  21 

P2i 

0 

(7.13) 


(7.14) 


Since  0^  and  were  arbitrary,  except  for  their  orthogonal  nature, 

we  note  that  we  can  replace  them  with  any  orthogonal  mapping  that  yields 


(7.14) 


r  -t 

Ti 

4>  P.2T_. 
o  l  21 

0 

O 

i  i 

\  0,  o  , 

r— 

o 

M!  = 

ph 

*2i 

o 

A 

II 

V 

x__'l _ 1 

x  1 

X  V  ' 

I  s 

0 

-T  k 

We  will  soon  see  that  these  quantities,  and  <^0Px'2 1  *  are  imPortant 

for  calculating  W(2i)  and  <J>0(2i)  ;  in  the  next  section  we  shall 
examine  why  this  might  be  anticipated. 

To  calculate  W(2i)  of  equation  (7.9),  we  already  have  most  of 


the  quantities  we  need: 
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W(2i)  =  W  +  4>TW^ [ I  +  XTX]  _1WT/24) 


(7.16) 


w^.wIY2  =  wV/2  +  <j>tw^[i  +  *tx]-W/24> 

2i  2i  ro  L  J  To 


w\T/2  +  4)Tw^t  tTwT//24> 

To  1  1 


To  calculate  P- .  ,  we  needed  W 

2i 


,T/2 


-T/2„»S 


(for  calculating  W'^P2)  .  We 

have  <f>o  and  available,  so  consider  calculating  the  right 

T/2  t- 

square  root  of  W  ,  W  '  ,  instead  of  W2  .  Selecting  our  third 

orthogonal  transformation,  0^  ,  we  have 


^3 

[t>t/2,  1 

1  i  To 

II 

O 

l _ 

A 

II 

V 

o 

-  ...-J 

wT/2 

l 

w^2 

2i 

M 

which  is  the  solution  we  required. 

To  calculate  d>  (2 i)  we  begin  by  noting  the  lack  of  symmetry  in 

equation  (7.5c)  and  the  need  for  calculating  [I  +  PW]  1  ,  which  isn’t 

available  from  previous  calci 
T 

quantities  T^  and  ‘ 21 : 


available  from  previous  calculations.  Instead,  we  have  only  the  two 
T  1 

r  * 


and 


Tj  =  [I  +  xV4* 

=  [I  +  wT//2  PW*5]-45 


j>  P4^,,  =  4>  P^XX^T..  =  <J>  P!sPT^2WT 
o  21  o  21  o  1 


=  4>opw45[i  +  wt'/2pw4s]'t/2 


T  - 1 

We  need  an  expression  for  <J>  (2i)  that  requires  [I  +  X  X] 


l>0(2i)  =  4>0[I  +  PW]  _1<*»o 


W‘T/2[I  +  WT/2PWJ5]"1WT/'2<{. 
o  o 


-T/2 

which  requires  computing  W  .  However, 


so 


w'T/2  =  w-T/2  +  P\{2  -  PW^ 


=  W"J/"[I  +  W  '  PW'2]  -  PW'1 


<Po(2 i)  =  <Po(W~T/2[I  +  WT/2PW^J  -  PW^)([ r  +  WT/2PW3s]‘1WT/2(j)o) 


<p  -  <p  pw^[i  +  wT//2pw?5]"1wT//24> 


0  0  o 


<p  <p  -  <p  p\?  tTwT/2c|> 

To^o  ro  21  1  Yo 


as  we  desired. 

For  initial  conditions,  we  look  to  the  initial  layer  for  the 
scattering  problem.  From  Friedlander  [1976]  we  have: 


Pq  -  rqh 

w l/2  .  r-t/2h 


*0  ■ 


Comments  on  the  Strengths  and  Weaknesses  of  Square-Root  Doubling 


Notice  that  this  derivation  leads  to  updating  equations  which 
require  no  explicit  inverses.  If  we  now  introduce  the  convention  used 
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by  Morf  [1978],  whereby  R  2  =  and  K  =  ’  we  ^ave 


I 

wT/V 

0 

R  2 

0 

0 

0 

<J>  p2 

•  w  = 

* 

K 

p35 

0 

Y0 

_ 

. 

2i 

'crW'V 

0 

„T/2 

WT/2 

2i 

*  *  -U  T/7 

<J»0C2i)  =  <j)0(I  -  K  (R  )  2W1/Z)<J>o 

*  P 

The  first  equation  provides  (R  )  2  ;  the  second  and  third  equations 
* 

require  (R  )~2  . 

As  a  second  observation,  note  that  repeated  eigenvalues  pose  no 
problems,  and  that  state  transition  matrices  can  be  singular.  However, 
the  measurement  covariance  matrix,  R  ,  must  be  inverted  to  determine 
the  initial  conditions. 

Morf  et  al.  also  note  the  ready  adaptability  of  this  algorithm  to 
parallel  processing.  Each  processor  can  independently  compute  estimates 
for  a  distinct  interval.  Later,  the  intervals  can  be  joined,  again  in 
parallel  (see  Chapter  ix).  This  procedure  can  be  generalized  to  the 
time-varying  problem. 

III. 8  Scattering  Revisited 

In  the  previous  section  we  showed  how  to  begin  with  results 


from  the  scattering  domain,  and  develop  them  algebraically  into  a  square- 
root  algorithm.  In  this  section  we  will  consider  deriving  algorithms 
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solely  in  the  scattering  domain,  developing  the  concept  of  the  "square- 
root"  of  a  medium,  orthogonal  scattering  layers,  and  the  triangulariza- 
tion  of  a  network.  The  results  are  salutary;  we  can  derive  the  square- 
root  algorithms  quickly,  and  with  new  insight  into  the  intrinsic 
constitution  of  this  class  of  solutions. 

We  switch  now  from  the  controller  application,  presenting  the 
Hamiltonian  for  the  estimation  problem  in  the  scattering  format: 


I  Vi  |  'O' 

— 

- Vi 

"■  II  1 

1 

1  1  1 

‘  \ 

1  | 

1  niw 


■  »T  , 

1  4 

tT  -• 

1  -L  /rY-- 

_  *T 

M-l  ^ 

i 

,  'ob*- 

-  9i*l 

Figure  8.1  Scattering  Hamiltonian  for  the  Estimation  Problem 


In  this  diagram,  a  node  of  the  form 


indicates  a  passive  branch,  where  A  =  B  =  C  .  A  node  of  the  form 


indicates  summation.  We  read  this  block  schematic  exactly  as  we  would 
read  a  transmission  matrix  block  diagram,  e.g.,  for 


u - >  N(z)D  (z)  - >  A(z)  - ^y 

Figure  8.2 

we  would  lift  the  equation  y=A*N*D^u,  preserving  the  order 
of  the  matrix  calculations.  Expanding  on  this  concept,  we  note  that  if 
we  take  the  square-root  of  a  matrix  A 


A  -  \2  AT/2 


this  translates  into  the  schematic 


(8.1) 


“tUb” 


With  this  background  we  are  able  to  exploit  the  strong  symmetry  of 
Figure  8.1. 

We  begin  by  defining  the  transpose  of  a  network  to  be  the  network 
reflected  about  the  defined  line  of  symmetry,  where  all  quantities  (path 
gains)  are  transposed,  and  all  arrows  are  reversed;  this  last  condition 
exchanges  all  branch  and  summation  nodes. 

We  then  define  the  square  root  of  a  network  to  be  a  network  which 
has  a  line  of  symmetry  such  that  when  divided  along  the  line  of  symmetry, 
one  partition  is  the  transpose  of  the  other.  Looking  at  Figure  8.1,  we 
note  that  one  square  root  of  the  scattering  network  will  be  as  shown  in 
Figure  8.4  . 


m 


=  aT/2 

f. 

3 

i 

a 

J 

i 

Figure  8.3 

j 

TV*  * 
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The  square-root  operation  introduces  complex  arithmetic,  which  appears  to 
be  an  added  complication  compared  to  the  algebraic  approach  presented 
earlier.  The  complication  is  ephemeral,  however,  arising  because  of 
the  minus  sign  injected  by  the  exchange  step  (see  Section  I I I. 6). 

Examining  the  connections  at  the  line  of  symmetry,  S  ^  ...  Sm  , 
notice  that  any  orthogonal  mapping  could  be  attached  without  altering 
the  terminal  characteristics  of  the  network.  For  example,  if  we 
contemplate  a  simple  orthogonal  network  such  as  a  permutation  layer, 


Sln 

_L 

P 

— ^F-~ 


Figure  3.5 


it  is  clear  that  S.  and  S  .  are  identical--  the  signals  are 
in  out  6 

"crossed"  at  the  border,  and  then  uncrossed.  Any  mapping  with  this 
characteristic  would  suffice.  Recalling  from  Section  (III. 6),  property 
(P7), 

ir  •  e7  =  i  =>  0-  *  &T  =  i 

any  orthogonal  mapping  is  a  candidate.  However,  we  are  interested  in 
orthogonal  mappings  which  correspond  via  an  exchange  step  to  transmission 
domain  mappings.  From  property  (P9) ,  these  were 

=  i  =>  e-  *  j  *  (o' ) 1  =  j 

The  J-unitary  property,  in  this  case,  is  an  imposed  boundary  condition. 

It  also  has  the  welcome  effect  of  eliminating  the  complex  arithmetic. 
Thus,  Figure  8.4  becomes 


Figure  8.6  J-Unitary  Mapping 


with  the  border  we  require.  Henceforth,  the  J- layer  boundary  will  be 
maintained  as  a  boundary  condition. 
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When  using  the  star-product,  we  combine  two  layers  into  a  single 
layer.  We  have  a  similar  objective  when  working  with  scattering  square- 
roots:  to  transform  a  two- layer  network 


Figure  8.7  Square- root  Star-product  Definition 


into  an  equivalent  single  layer 


Figure  8.8  Square-root  Star-product  Definition 


If  we  consider  only  terminal  behavior  (in  the  network  theory  sense), 

★ 

then  by  introducing  a  J-orthogonal  layer  0  to  Figure  8.7  we  get 


Figure  8.9  . 
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Figure  8.9  Detail  of  Combined  Layer 

* 

At  the  terminals,  the  mapping  O'  has  the  desired  effect  of  producing 
a  single  layer. 

* 

To  understand  the  structure  of  O'  ,  we  consider  decomposing  it 
into  two  concatenated  mappings 

O'*  =  O'*  *  O'*  (8.2) 

* 

6^  will  be  used  to  compress  two  unidirectional  paths  (for  example, 

★ 

two  downward  paths)  into  a  single  path;  0^  will  be  used  to  uncross 

* 

the  paths  we  find  crossed  in  O'  of  Figure  8.9  . 

The  first  step  is  to  demonstrate  that  unidirectional  scattering 
layers  (where  one  direction  neither  transmits  nor  reflects)  can  be 
mapped  into  a  triangular  layer.  We  begin  by  considering  a  structure  of 
the  form: 


Figure  8.10  Simplified  Boundary  Mapping 


Reorganizing  and  rotating  this  schematic  into  canonical  form  (as  a 
standard  scattering  diagram) ,  we  have 


Figure  8.11  Combined  Layer  in  Canonical  Form 


* 

The  zero  transmission  and  reflection  gains  of  Mg  make  this  an  especially 
simple  scattering  layer  to  work  with. 

Conceptually,  we  can  say  that  the  rank  of  the  signal  at  C  is,  at  most 
equal  to  the  rank  of  the  signal  at  In  .  We  saw  earlier  that  orthogonal 
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matrices  can  be  used  to  consolidate  rank  by  triangularizing  and  pro¬ 
ducing  columns  (or  rows)  of  zeroes;  any  orthogonal  triangularization, 
_  * 

0*  ,  can  be  used  to  reduce  Mg  to  consolidated  (triangular)  form. 
Aggregating  the  non-zero  transmission  gains  produces 


V 

o' 

* 

0 

0 

_ 

_ 

L 

B1 

X 

B2A1 

0 

B3A2A1 

0 

0 

.B4A3A2A1. 

0 

A3A2A1 

0 

A. 

(8.3) 


*1 


M* 


* 

The  need  for  0^  of  (8.2)  arises  when  we  notice  that  the  scattering 
problem  would  segment  into  two  unidirectional  layers  (flowing  in  opposite 
directions)  with  the  elimination  of  the  loop,  A  ,  in  Figure  8.9  . 

This  gives  rise  to  the  second  application  of  orthogonal  triangularization, 
the  application  discussed  in  Section  I I I. 7  .  We  need  to  unwind  the 
following  loop: 


i  i 

_ 

Figure  8.12  Definition  of  Loop  Unwinding 


(Note  that  the  lower  square-root  layer  was  introduced  for  clarity.) 


If  we  consider  In  and  Out  to  be  one  side  of  a  layer,  this  layer 
can  be  transformed  into  canonical  form  by  re-ordering  the  summation 
computation  and  determining  the  transfer  functions  between  the  terminals 


Finding  this  mapping  is  equivalent  to  asking  whether  an  orthogonal 

* 

transformation  can  always  be  applied  to  an  upper  triangular  matrix  U 

* 

to  map  it  into  a  lower  triangular  matrix  L  .  The  mapping  exists,  as 
was  demonstrated  in  Section  III. 6 

k  k  k  k  k 

P12)  3  Or >■  A  *  &  =  T  ,  T  triangular,  upper  or  lower. 

★ 

In  the  first  application  of  the  orthogonal  layer  O'  (Figure  8.10), 
the  structure  of  the  layer  was  unimportant.  The  side-effect  of  producing 
a  desired  terminal  behavior  made  the  layer  useful.  In  the  present  context, 
the  structure  of  the  mapping  and  the  values  resulting  are  important;  it 
is  necessary  to  determine  the  parameters  a,  g  and  y  of  Figure  8.14b  . 

★  k 

These  parameters  can  be  determined  in  terms  of  and  U  ,  but 

it  is  easier  to  work  in  the  transition  domain.  The  parameters  could  then 

be  determined  from  the  results  of  Section  II 1. 4,  or  from  a  comparison  of 
* 

L  and  L  as  derived  in  Section  III. 6,  where: 

*  *  * 

u  *  er2  =  l 


Therefore,  applying  an  exchange  step  gives  the  parameters  in  the 
desired  format : 


★ 

The  result  oi  applying  the  orthogonal  transformation  0^  to  Figure 
8.12  to  obtain  Figure  8.14b  can  be  drawn  as  in  Figure  8.15  . 
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and  finally  collapse  the  boundary  connections  using  orthogonal  transforms 


Figure  8.20  Application  to  Square  Root  Doubling 


We  can  now  read  out  the  requisite  equations  required  to  calculate  the 
entries  of  Figure  8.17.  These  equations  follow  directly  from  our 
previous  rules : 


WT/2(2i):  V2  • 


wT/2 

T 

1  Yo 


W. 


rT/2 


2i 

0 


PJ'2(2i):[pI'2|4»oPJ5T2]  •  0^  =  [p!*.;0] 
<K2i):  ♦_(2i)  =  $  [I  -  P**T71T V/2]4>0 


(8.6) 


(8.7) 


(8.8) 


Comment : 


This  approach  clarifies  several  issues.  The  calculation  of  P 
and  W  ,  which  cut  across  lines  of  symmetry,  is  fundamentally  different 
from  the  calculation  of  <pQ  ,  which  lies  along  a  transmission  path. 

The  transmission  path  variables  are  not  subject  to  calculation  via 
orthogonal  transformations,  although  terms  arising  from  the  loop  removal 
at  A  are  useful . 

There  are  two  distinct  applications  of  orthogonal  transformations: 
an  explicit  application  for  loop  removal,  and  an  implicit  application 
to  compress  the  structure.  The  latter  application  will  involve  trans¬ 
formations  both  from  the  left  (for  an  output  border  junction),  and  the 
right  (for  an  input  border  junction). 

Application  II—  Normal  Square  Root 

We  begin  with  another  of  Friedlander's  [1976]  layer  ensembles, 
this  one  specified  for  propagating  the  Riccati  equation  by  one  time 
step.  (See  Figures  5.7  and  5.8.)  We  have 


Figure  8.21  Application  to  Normal  Square  Root 


We  take  the  square  root  of  the  network,  and  apply  the  appropriate  loop 


removal 


Figure  8.22  Application  to  Normal  Square  Root 


Again,  we  can  read  the  equations  from  Figure  8.22  directly: 

P^Ci+l):  [*ipir2;riQ?!l  •  0-3 


WT/2(i+i):  & 


WT/ 2 


tTrT^H.  ip  (i) 

1  1  irov  ’ 


(8.9) 

(8.10) 


<p  (i+1)  :  <J>.  [I  -  P'?T_1tTr7'1H.  ]cf>  (i) 
Yov  1  1 1  1  21  1  1  ijyo 


(8.11) 


As  we  found  earlier  (Section  III. 5),  P2  can  be  computed  indepen- 
T/2 

dently  of  W  and  <J>  ;  the  first  equation  can  be  propagated,  and 


the  second  two  ignored. 


94 


III. 9  Algorithm  Revisions 


Eigenvector  Decomposition 

Several  problems  were  noted  with  eigenvector  decomposition: 

1)  The  Hamiltonian  must  have  a  full  set  of  eigenvectors,  which  does 
not  always  occur  when  there  are  repeated  closed  loop  eigenvalues, 

2)  The  measurement  noise  matrix  cannot  be  singular;  R  (or,  in  the 
controller  case,  B)  must  be  invertible, 

3)  The  state  transition  matrix  cannot  be  singular. 

Several  extensions  have  been  suggested  to  solve  the  first  problem. 

Paul  Van  Dooren  [1978]  has  suggested  using  Stewart's  [1973]  results 
on  repeated  eigenvalues.  Stewart  notes  that  the  subspace  spanned  by 
the  eigenvectors  corresponding  to  a  cluster  of  eigenvalues  of  a 
Hermitian  matrix  is  relatively  insensitive  to  perturbations  in  the 
matrix.  Using  these  results.  Van  Dooren  suggested  looking  for  a  basis 
of  the  subspace  directly,  without  computing  eigenvectors  (see,  for 
example,  Laub  [1979]]. 

The  second  problem,  arising  because  of  a  singular  measurement 
covariance  matrix,  can  be  solved  using  the  technique  of  Bryson  and 
Henrikson  [1968]  and  Gevers  and  Kailath  [1973].  A  singular  matrix 
implies  that  after  a  finite  time,  several  of  the  states  can  be  deter¬ 
mined  precisely.  We  can  therefore  remove  these  states  from  the  estima¬ 
tion  process,  reduce  the  order  of  the  system,  and  derive  a  system  with 
a  non-singular  R  .  In  Chapter  V  we  will  discuss  Gever  and  Kailath's 
work  which  gives  conditions  for  determining  when  this  reduction  is 
possible. 


These  algorithms  do  not  suffer  from  the  repeated  eigenvalue  problem 
or  the  transition  matrix  singularity  problem.  They  do,  however,  require 


an  invertible  measurement  covariance  matrix,  R  ,  or  control  weighting 
matrix,  B  .  This  can  be  seen  quite  clearly  from  the  scattering  picture 
or  from  the  square  root  equations;  both  approaches  require  inverting 
this  quantity.  As  with  eigenvector  decomposition,  the  predictable 
directions  need  to  be  removed  from  the  system. 
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Table  3.2  BASIC  ALGORITHM  OVERHEAD 


OPERATION | | Operators 

+  - 

X 

T 

/— 

m  P 

■□•d 

n(m-l)p 

nmp 

m  m 

■  1\ 

j  m(m-l) 

j  m(m+l) 

m 

n0  *  jdb 

—  B  =  A 

j  (n+1) (m-1) 

j  m(m+l) 

Kn  n 

n[\  '  tk 
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n  n 
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n 

Chapter  IV 

THE  STEADY-STATE  SOLUTION  TO  THE  CONTINUOUS  RICCATI  EQUATION 


In  the  previous  chapter  we  examined  several  diverse  approaches 
to  solving  the  discrete-time  Riccati  equation;  in  this  chapter  we  will 
consider  the  continuous  equivalent.  The  focus  of  this  dissertation  is 
on  discrete  solutions.  This  chapter  will  therefore  be  brief,  principally 
discussing  iterative  solutions  to  the  continuous  problem. 

We  again  consider  two  applications:  estimation  and  control.  In 
exactly  analogous  fashion  with  the  discrete-time  case,  the  solution  of 
these  two  problems  requires  the  solution  of  one  of  two  related  differen¬ 
tial  equations.  The  continuous  problem  begins  with  the  definition  of 
the  equivalent  state-space  model: 


x(t)  =  F (t)x(t)  +  G(t)  u(t) 
y (t)  =  H(t)x(t)  +  v(t) 


(1.1) 


For  the  estimation  problem,  we  assume  u(t)  and  v(t)  are  white  noise 
disturbances 


■mt  y 

[uT(s)  VT(s)]  = 

O' 

o' 

v(t) 

o 

R 

6(t-s) 


(1.2) 


Then,  x(t)  ,  the  linear  least  squares  estimate  of  x(t)  given 
observations  up  to  time  t,  can  be  obtained  via  the  Kalman  filter 
equat ion : 

S(t)  =  [F (t )  -  P(t)HT(t)]*(t)  +  P(t)HT(t)y (t)  ,  S(0)  =  0  (1.3) 


where  P(t)  is  the  covariance  of  the  state  error  estimate 

P(t)  =  F(t)P(t)  +  P(t)FT(t)  +  G(t)Q(t)GT(t)  (1.4) 

-  P(t)HT(t)R'1(t)H(t)P(t) 

=  g  (5c(t)-x(t))(x(t)-x(t))T} 

p(0)  =  no 

This  is  the  estimation  Riccati  equation. 

The  control  problem  begins  with  the  same  state  space  model,  except 
that  perfect  knowledge  of  the  states,  x  ,  are  assumed  (the  output, 
y  ,  is  ignored)  and  the  input,  u  ,  is  assumed  deterministic  and 
available  for  control. 

The  problem  solution  begins  by  defining  a  performance  index,  J  , 
to  be  minimized: 

tf  t 

J  -  {  [x  (t)A(t)x(t)  +  u(t)B(t)u(t)]dt  (1.5) 

o 

where  A(t)  is  non-negative  definite  and  B(t)  is  positive  definite. 
The  optimal  solution,  in  the  sense  of  a  control  input,  u(t)  ,  which 
minimizes  J  ,  is  given  by 

u(t)  =  -B(t)  GT (t)  S(t)  (1.6) 

where  S(t)  is  the  solution  of  the  second  Riccati  equation: 

S(t)  =  S(t)F(t)  +  FT(t) S(t)  +  A(t)  -  S(t)G(t)B-1(t)GT(t)S(t) 
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Whereas  in  the  discrete  case  an  exact  solution  to  the  matrix 
Riccati  equation  was  readily  computable,  in  the  continuous  case  this 
is  often  more  difficult.  For  the  continuous  solution,  it  is  often 
insufficient  to  rely  upon  matrix  fundamentals  and  linear  least  squares 
theory;  a  detailed  understanding  of  numeric  integration  is  required. 

This  takes  us  beyond  the  purview  of  this  thesis;  perforce,  we  shall 
focus  on  those  issues  directly  related  to  the  presentation  of  the 
previous  chapter. 

However,  we  will  be  interested  in  steady  state  solutions  which 

•  • 

means  we  are  interested  in  solutions  where  the  derivatives  P  and  S 
are  identically  zero.  This  constraint  converts  differential  equations 
into  algebraic  matrix  equations.  Therefore,  in  the  sequel,  when  an 
algorithm  requires  integration,  we  should  suspect  that  excessive 
complication  is  being  introduced. 

This  chapter  begins  by  briefly  noting  the  similarities  between 
the  discrete  and  the  continuous  eigenvector  decomposition  routines. 
Treatment  of  continuous  square  roots  is  deferred  to  the  paper  by  Morf, 
Levy,  and  Kailath  [1978],  and  is  not  discussed.  Square  root  doubling 
is  treated  in  terms  of  sundry  proposed  solutions,  with  emphasis  given 
to  their  strengths  and  weaknesses.  One  approach,  the  bilinear  transform, 
is  discussed  more  deeply. 

IV. 2  Eigenvector  Decomposition 

As  in  the  discrete  case,  eigenvector  decomposition  can  be  used  to 
solve  for  the  steady-state  error  covariance.  The  solution  was  described 
by  Potter  [1966],  and  later  implemented  by  Bryson  and  Hall  [1971] 
using  the  QR  algorithm. 
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Unlike  the  discrete-time  algorithm,  the  continuous  version  of  the 
algorithm  does  not  require  the  inversion  of  the  dynamics  matrix. 

Repeated  closed  loop  eigenvalues,  implying  possibly  non-existent 
eigenvectors,  and  singular  measurement  covariance  matrices  still  cause 
problems.  An  extension  of  the  algorithm  is  required,  as  outlined  in 
Section  Nine  of  the  previous  Chapter. 

IV.  3  Square  Root  Doubling 

Implementing  doubling  in  the  continuous  domain  is  conceptually 
identical  to  the  discrete  problem.  The  difference  is  found  in  the 
determination  of  the  initial  interval.  In  the  discrete  case,  we  begin 
with  solutions  for  one  time  step,  assuming  zero  initial  conditions  --a 
straightforward  procedure.  In  the  continuous  case,  we  again  need  to 
solve  for  that  initial  interval.  In  the  latter  case,  the  obvious 
approach  proves  more  difficult. 

A  typical  solution  to  this  problem,  as  outlined  by  Bierman  and 
Sidhu  [1976],  and  reviewed  by  B.D.O.  Anderson  [1978]  involves  a  two- 
stage  procedure:  the  first  stage  requires  translating  the  continuous 
problem  into  an  analogous  discrete-time  problem,  and  the  second  stage  is 
the  solution  of  this  analogous  problem.  One  approach  would  be  to  expli¬ 
citly  solve  the  Riccati  equation  over  the  initial  interval.  The  question 
arises  as  to  how  short  to  make  the  interval;  as  Anderson  notes,  if  the 
interval  is  too  short,  the  recursive  part  of  the  algorithm  may  not 
converge  propertly.  There  is  also  the  added  burden  of  computing  the 
transition  matrix  for  the  interval,  independent  of  whether  integration 
or  power  series  approximation  is  used  for  the  computation. 
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As  an  alternative,  which  avoids  the  computational  burden,  Anderson 
suggests  an  algorithm  by  Roberts  [1971]  and  Denman  [1976].  They  begin 
with  the  continuous  Hamiltonian 


H  = 


-IT 

-F  GR  G 


which  they  iterate  by  computing 


Hi+1 


=  ^(h.+hT1) 


This  leads  both  to  the  steady-state  error  covariance  and  to  a  doubling 
algorithm.  This  approach  was  not  very  promising  because  the 
doubling  algorithms  we  derived  required  the  inversion  of  the  dynamics 
matrix,  F;  in  the  continuous  case,  singular  F  matrices  are  quite 
common.  Possibly  by  using  pseudo-inverses ,  a  more  complex  square-root 
expansion,  or  a  scattering-  domain  derivation,  this  problem  could  be 
circumvented. 

Another  approach  is  to  recognize  that  a  bilinear  mapping  exists 
which  yields  a  discrete  equivalent  to  the  continuous  Riccati  equation; 
it  is  equivalent  in  the  sense  that  both  equations  have  the  same  steady 
state  solution. 

Hitz  and  Anderson  [1972]  present  a  mapping  from  the  continuous  to 
the  discrete-time  domain.  They  prove  that  the  steady-state  solutions 
to  both  the  differential  and  the  difference  equation  are  identical  t  and 
that  as  long  as  a  positive  definite  solution  exists  and  is  unique,  then 
the  discrete-time  equivalent  converges  for  any  nonnegative  definite 
initial  condition. 


-■ji-'.c  • 
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Z-Ct 

Noting  that  the  bilinear  transform,  S  <=  -  ,  considered  with 

6  z+a 

the  Laplace  transform  and  the  inverse  Z  transformation,  establishes 
an  explicit  isomorphism  between  the  spaces  of  square  integrable  func¬ 
tions  and  square  summable  sequences,  Hitz  et  al  develops  the  mapping 
for  the  control  form  of  the  Riccati  equation.  We  will  briefly  present 
the  estimator  equivalent. 

Given  an  algebraic  matrix  equation  for  the  steady-state  estimator 
error  covariance  of  the  form 


FP  +  PFT  +  GQGT  -  PHTR_1HP  =  0  (3.1) 

we  want  to  transform  this  equation  into  the  discrete  equivalent 

4>M$T  +  rQDFT  -  <fMHj(RD+HDMHp)_1HDM$T  =  M  (3.2) 

where  M  and  P  are  exactly  equal.  The  coefficients  of  these  two  equa¬ 
tions  can  be  related  by  either  full  or  square  root  formulas: 


Full 


HD  =  /2  a  H(al-F)'1 

Qx  =  2a(aI-F)_1GQGT(aI-F)"T 

/2 

B  =  ~  HQj 

RD  =  aR  +  ^HQ1HT 
T  T  -1 

rV  ■  Qi  -  B  rd  B 

4>  =  (al+F)  (al-F) -1  -  BTR'1H,) 


Square  Root 

Hd  =  J2  a  H(al-F)'1 
=  /2 a  (aI-F)_1GQ^ 

B  =  ^  HQ*5 

'D 


Rn  =  [ocR^lB] 


tqJ  =  Q^[I  j  JBTRdT/2] 


(3.3) 


<j>  =  (al+F)  (al-F) _1  -  Q^Rj^R^Hp 


where  a  is  real,  greater  than  zero,  and  not  equal  to  any  eigenvalue 


of  F  . 


The  proof  that  equations  (3.3)  do  in  fact  transform  (3.1)  into 
(3.2)  follow  from  a  direct  substitution  and  simplification.  For 
details  of  such  a  proof,  see  Hitz  [1972].  They  also  show  that  any 
solution  for  an  equation  similar  to  (3.1)  is  also  a  solution  for  an 
equation  similar  to  equation  (3.2). 

We  chose  to  implement  this  version  of  the  doubling  because  it  could 
easily  and  quickly  be  incorporated  into  existing  discrete  doubling 
algorithms.  The  overall  algorithm  is  still  a  two  step  procedure,  but 
the  initial  step  is  relatively  fast  compared  to  the  doubling  stage. 

There  are  also  fewer  constraints  on  the  class  of  problems  which  can  be 
solved,  "a"  can  always  be  chosen  so  that  (ai-F)  is  invertible.  We 
may  have  problems  if  Rp  is  not  invertible,  but  in  the  bilinear 
algorithm  is  comprised  of  a  combination  of  all  noise  covariances. 

It  will  usually  be  invertible. 

The  three  main  problems  with  the  bilinear  mapping  are  other 
aspects  of  accuracy, speed,  and  convergence.  We  have  introduced  another 
level  of  computation,  including  an  inversion;  in  some  cases  this  will 
exacerbate  ill-conditioning  .  For  speed,  we  already  noted  the 
handicap  of  added  overhead.  Rate  of  convergence  is  also 
determined  by  the  choice  of  alpha.  These  factors  will  be 
explored  in  Chapter  VI. 

Finally,  we  note  that  intermediate  values  in  this  scheme  have  no 
significance;  only  the  final  value  has  a  continuous  analog.  (In  the 
purely  discrete  case,  the  error  covariance  at  the  "ith"  iteration  corres 
ponded  to  P ( 2 1 ) . )  In  some  contexts,  this  abstraction  might  be  a  dis¬ 
advantage  compared  to  doubling  techniques  that  iterated  continuous-time 


variables. 
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IV. 4  Conclusions 

For  solving  the  continuous,  steady-state  Riccati  equation,  eigen¬ 
vector  decomposition  is  the  favored  approach.  It  is  numerically  well 
behaved,  fast,  and  it  does  not  suffer  from  some  of  the  limitations  that 
make  eigenvector  techniques  the  secondary  choice  for  the  discrete 
problem;  repeated  eigenvalues  and  singular  R  matrix  can  still  be  a 
problem,  however. 

The  square-root  doubling  algorithms  were  all  limited  by  one  of 
three  weaknesses.  The  first  set  of  algorithms  required  an  initial 
solution  to  the  Riccati  differential  equation  over  a  fixed  interval. 

The  second  set  of  algorithms  was  limited  in  the  classes  of  practical 
problems  they  could  solve.  The  third  set  of  algorithms  was  subject 
to  burdensome  computational  overhead,  incurred  while  converting  a 
continuous  problem  into  an  equivalent  discrete-time  problem;  the  result 
was  to  sacrifice  speed  and  accuracy.  The  importance  of  this  overhead 
will  be  considered  later. 

In  the  next  chapter  we  will  consider  the  class  of  problems  which 
differentiate  among  these  algorithms,  focusing  on  discrete-time  systems. 
In  Chapter  VI,  we  return  to  these  algorithms,  presenting  an  empirical 
comparison. 
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Chapter  V 

DIFFERENTIATING  APPLICATIONS 

Most  algorithms  handle  most  practical  problems  most  of  the  time. 

For  this  set  of  problems,  important  questions  involve  criteria  such 
as  speed  of  computation,  storage  space  requirements,  and  computational 
precision  required.  These  issues  are  examined  in  the  next  chapter. 

Another  class  of  problems  have  salient  characteristics  which 
preclude  the  use  of  certain  algorithms.  In  this  chapter  we  will  focus 
on  two  such  classes:  problems  having  singular  state  transition  matrices, 
and  problems  having  repeated  closed-loop  eigenvalues. 

An  interesting  and  important  class  of  problems  arises  when  we 
consider  modelling  discrete  systems  with  pure  delays.  These  delays 
arise  for  a  variety  of  reasons--  sampling  delays,  computation  overhead, 
transport  lag--  and  often  lead  to  models  with  singular  dynamics 
matrices.  As  we  mentioned  in  Chapter  III,  singular  dynamics  matrices 
exacerbate  weaknesses  in  several  algorithms,  so  we  shall  examine  this 
issue  in  detail. 

The  singular  transition  matrix  collection  of  examples  is  related  to 
the  problem  of  designing  reduced-order  observers  because  of  a  conjecture 
that  there  always  exists  a  reduced  order  system  equivalent  to  the  original 
singular  system  [Katz,  1972],  We  therefore  begin  by  considering  Bryson  and 
Henrikson's  work  [Bryson,  1968]  on  reduced  order  estimators .  We  then  consider 
Gevers'  extension  [Gevers,  1972],  After  examining  Brammer  [Brammer, 

1968],  and  Tse  and  Athan's  [Tse,  1970]  work  on  singular  systems,  we 
consider  a  practical  example,  and  then  evaluate  Katz's  conjecture  and 
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Powell  and  Parson’s  [Powell,  1978]  result  in  light  of  our  results. 

In  the  second  section  we  consider  the  case  of  repeated  closed- 
loop  eigenvalue  problems  as  they  arise  in  satellite  design  and  perfect 
measurement  examples. 


V.2  Reduced-Order  Observers 


The  Discrete  Case 

System  order,  'n'  ,  is  a  significant  criterion  in  both  filter- 
design  and  filter  implementation.  Computation  time  is  typically  p^opor- 

•7 

tional  to  n  ,  at  best  (except  for  the  fast  square  root  algorithms) ; 

2  * 

data  storage  requirements  are  often  proportional  to  n  .  Order  reduction, 
when  cost  effective  and  while  preserving  other  performance  properties, 
is  therefore  desirable. 

In  continuous  time  systems,  if  the  output  of  the  system,  y(t)  , 
is  sufficiently  smooth,  then  a  uniquely  identifiable  number,  a  ,  of 
different  projections  of  the  state  are  calculable  without  error  as 
some  linear  combination  of  the  output  and  its  first  a-1  derivatives. 
Hiis  achieves  a  maximal  reduction  in  the  order  of  the  Kalman  filter 
Riccati  equation  [Bryson,  1965]  [Geesey,  1973]  [Gevers,  1972]. 

Bryson,  et  al . ,  noted  similar  computation-reducing  effects  from 
differencing  observations  in  the  discrete-time  case  [Bryson,  1968]. 

Bucy,  Rappaport,  and  Silverman,  noted  that  in  certain  discrete-time 
cases,  differencing  did  not  have  the  same  computation-reducing  conse¬ 
quences  noted  in  the  analogous  continuous  time  case  [Bucy,  1970], 
[Rappaport,  1970].  Gevers  and  Kailath  went  on  to  elucidate  and  explain 
this  difference  in  terms  of  properties  of  the  associated  covariance 
matrices  [Gevers,  1972]. 

Two  differences  between  continuous  and  discrete  systems  lead  to 
these  disparate  results.  First,  that  differencing  operations  introduce 
a  time-delay  between  y(i)  and  y(i-l)  ,  whereas  the  signal  y(t)  and 
its  derivative  y(t)  have  the  same  time  argument.  Secondly,  any 
transfer  function  of  the  form 
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H(z)  =  z'^HjCz)  ,  1  £  k  £  a-1  ,  a  >  2 

produces  the  same  power  spectral  density  and  can  be  realized  by  a 
causal  system.  In  contrast,  in  the  continuous  case,  the  difference 
between  the  degree  of  the  numerator  and  denominator  is  unique  for  all 
rational  factorizations. 

Gevers  went  on  to  consider  a  special  form  of  the  discrete-time, 
one-step  prediction  filter 

x(i  +  l)  =  4>  x(i)  +  T  w(i+l)  6  [w(i)w(j)T]  =  Q  S(i-j) 
y(i)  =Hx(i)  n(i)  =  f  (x(i)x(i)T) 

This  model  often  arises  in  systems  that  have  correlated  noise  terms 
in  their  output  [Bryson,  1968];  the  measurement  noise  term  is  eliminated 
by  state-augmentation. 

As  Bryson  points  out,  the  associated  Riccati  equation  may  then  be 
ill-conditioned,  because  its  dynamic  rank  is  lower  than  its  order. 

This  can  be  alleviated  by  proper  partitioning  of  the  augmented  state 
equations,  but  is  a  potential  hazard  for  blind  application  with  all 
strategies,  including  the  square  root  schemes. 

The  direct  approach  to  the  conditioning  problem,  however,  is  to 
reduce  the  order  of  the  Riccati  equation.  Gevers  showed  that  this 
could  be  done  maximally  only  when 

H(i-k)<Hi-k,i)r(i)  =0  k  =  l,...,a-l 

H(i-k)<J>(i-k,i)  x(i)  =  y(i-k)  k  =  0,...,a-l 

H(i-k)<Hi-k,i)It(i)  =  ,V  T(i-k)<j>T(i,i-k)  k  =  0,...,a-l 

k  <  i  <  N 
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where 

n(i+i)  =  <Hi+i,i)n(i)4>T(i+i,i)  +  r(i+i)QrT(i+i)  ,  W(i)  =  n(i)HT(i) 

Here,  a  is  the  difference  between  the  order  of  the  numerator  and 
denominator  polynomials  of  the  spectral  density  function: 


,  b  zn-a  +  b  1zn-“'1  +  ...  +  b,  z 
Sy(z)  =  H(z)H(z-1)  =  SL - - - 


-n+a 


z  +  a^z 


+  a_  z 
2n 


For  the  time  invariant,  steady-state  case,  we  have 


H  4»1_k  T  =  0 
H  (j>i-kx(i)  =  y(i-k) 

h  (f)1'1^  n  =  h  nT[$k_1]T  , 

n  =  $n<j)T  +  rQrT 


k  =  1, . . . ,a-l 
k  =  0, . . . ,a-l 
k  =  0, . . . ,a-l 


It  is  therefore  necessary  to  identify  the  definite  relative  order  of 
the  system,  a  ,  determine  if  the  above  constraints  are  matched,  and  if 
not,  transform  the  model  to  a  suitably  constrained  model.  Gevers 
demonstrated  this  could  be  done  for  the  single-input,  single-output 
case. 

Another  important  result,  derived  for  the  case  of  no  measurement 
noise  and  single  output,  shows  that  for  singular  <j)  with  completely 
observable  states,  the  system  has  no  predictable  directions.  The 
system  order  can,  however,  be  reduced  by  one;  for  the  system  with  white 
measurement  noise,  this  reduces  the  system  to  the  normal,  unaugmented 
realization  (that  is  still  singular). 
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Design  with  Delays  y 

Singular  j>  Matrix 

A  system  with  a  pure  delay  —  where  the  output  of  at  least  one 
state  is  not  fed  back  into  that  state —  modelled  in  state  space  format, 
yields  a  system  with  a  singular  ([>  matrix.  Assume  is  observable. 

Then,  as  we  saw  from  Gevers'  results,  we  would  not  expect  to  be  able 
to  reduce  the  order  of  the  estimator  by  taking  differences  of  the 
system  outputs . 

An  interesting  alternative  structure  for  the  filter  arises  if  we 
consider  a  special  case.  If  each  state  suffers  the  same  pure  delay 
before  output,  then  the  0  matrix  of  the  filter  can  be  simplified 
by  removing  states  corresponding  to  pure  delays  (see  Figure  5.1). 


Prediction 

The  best  estimate  of  a  state  x  at  time  i+m  ,  x(i+m)  ,  given 
data  up  to  time  i  is  simply  (see,  for  example.  Sage  [1971]) 

x(i+m|i)  =  4)mx  ( i  |  i ) 

where  x (i)  is  the  filtered  estimate  of  x(i)  .  Further,  the  state 
error  covariance  is 

T 

P(i+m|i)  = g  [x(i+m| i)x(i+m | i)  ] 

=  (<j>M)P(i|  i)  (4>m) T  +  E  (*m~1)r  Q  r1^"1"1)1 

i  =  l 

where 

x  4  (x-x) 


Figure  5.1 


ALTERNATIVE  REALIZATIONS  FOR  SYSTEM  WITH  PURE  DELAY  ON 
OUTPUT. 
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The  estimator  for  Figure  5.1  can  therefore  be  constructed  without  the 
m-order  delay  incorporated  in  the  state  transition  matrix.  Outside  the 
feedback  loop,  the  states  can  be  predicted  from  the  past  back  to  the 
present,  for  the  estimator  will  now  be  running  m  cycles  behind  the 
modelled  system  (see  Figure  5.2). 

This  reorganization  may  provide  a  significant  computational  savings, 
especially  if  delayed  estimates  are  satisfactory.  It  may,  however, 
introduce  an  increase  in  complexity,  especially  if  state  estimation  will 
be  used  for  closed-loop  control.  The  additional  complexity  arises 
because  now  delayed  versions  of  the  predicted  states  must  be  fed 
back  into  the  system  (see  Figure  5.3). 

If  all  states  must  be  predicted,  and  later  delayed,  the  augmented 
approach  may  be  more  computationally  efficient.  In  most  cases,  however, 
the  computational  tradeoff  is  between  increased  multiplications  (for  the  aug¬ 
mented  realization)  versus  storage  (for  the  predicted  realization),  with 
storage  being  comparatively  inexpensive. 

These  issues  only  arise  because  we  have  imposed  a  structure  on  our 

A 

estimator.  The  output,  x  ,  is  identical  in  both  cases.  The  computa¬ 
tions,  in  contrast,  are  performed  in  different  coordinate  systems. 

One  realization  can  be  derived  from  the  other,  and  the  minimal  realiza¬ 
tion  (in  operation  count)  may  be  neither  of  those  proposed. 

An  Example 

As  an  example,  we  consider  the  problem  detailed  in  Appendix  C2. 

As  presented,  this  is  a  second  order  system  with  a  pure,  second-order 
delay : 


116 


x(i+l)  = 

"l  T1 

x(i)  + 

r  2  1 

7/2 

.0  i. 

T 

w(i) 


yC  i)  =  [1  0]x (i-2)  +  v(i) 


e(w(i)w(j)  )  =  q  5 (i- j ) 


•  -.T 


e(v(i)v(j)  )  =  r  6(i-j) 


e(w(i)v(j)  )  =  o 


(For  numerical  results,  r  is  assumed  to  be  unity  and  T  to  be 
1  Hertz. ) 

We  begin  by  considering  the  augmented  system  incorporating  the  delay: 


'l 

T 

0 

o' 

r  = 

r  2 

T  /2 

0 

1 

0 

0 

T 

1 

0 

0 

0 

0 

0 

u 

0 

1 

0 

0 

— 

H  =  [0  0  0  1] 


This  problem  is  in  the  standard  state-space  format;  in  addition, 

(J)  is  singular.  As  detailed  in  the  last  section,  we  can  approach  this 
problem  in  two  ways. 

Working  with  the  full- order  augmented  system,  we  cannot  solve  the 
Riccati  equation  using  the  proposed  eigenvector  decomposition  algorithm. 
Square  Root  Doubling,  in  contrast,  is  perfectly  satisfactory. 

Working  with  the  segmented  system--  a  second  order  filter  and  a 
second  order  predictor--  we  can  use  any  of  the  algorithms  presented  in 
Chapter  III  to  solve  the  Riccati  equation. 

The  root  locus  of  estimator  roots  plotted  as  a  function  of  distur¬ 
bance  covariance  appears  in  Figure  5.4.  As  expected,  the  loci  for  both 
designs  coincide,  except  for  the  presence  of  an  additional  pair  of 
poles  corresponding  to  the  two  delays  in  the  augmented  system. 


»*V 
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STAR-TRACKING  TELESCOPE  OPTIMAL  ESTIMATOR 


Eigenvalues  vs 


Process  noise 
Measurement  noise 


Assumes  m  cycles  of  delay,  m  0 


Figure  5.4  LOCUS  OF  ESTIMATOR  ROOTS  VERSUS  DISTURBANCE  COVARIANCE 
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In  Figure  5.5  we  present  a  plot  of  the  error  covariance  of  the  first 
state  (the  measured  state)  as  a  function  of  the  disturbance  covariance. 
Again,  the  two  plots  coincide,  as  do  the  other  covariance  terms  (Figure 
5.6). 

Also  presented  in  Figure  5.5  is  the  first -state  error  covariance 
assuming  the  delay  in  the  estimator  is  ignored  (m  =  0)  .  In  addition, 
we  have  plotted  the  covariance  assuming  m=0  in  both  the  system  and 
the  estimator.  We  will  comment  on  these  results  after  examining  another 
approach  to  solving  this  problem. 

Comments  on  Previous  Work 

Katz,  in  his  treatment  of  the  eigenvector  decomposition  problem 
with  singular  <{>  matrix  [Katz,  1972]  conjectured,  in  effect,  that  an 
equivalent  reduced-order  estimator  with  non-singular  transition  matrix 
could  always  be  found.  An  estimator  for  this  non-singular  equivalent 
could  then  be  developed  by  direct  application  of  the  eigenvector  decom¬ 
position  algorithm.  This  conjecture,  however,  was  not  formulated  as 
an  algorithm;  it  was  simply  stated  as  an  existence  hypothesis.  No 
counter  examples  have  yet  been  found,  and  without  a  mathematical 
formulation,  the  conjecture  cannot  be  proved  correct. 

We  consider  an  example  proffered  by  Powell  and  Parsons  [Powell, 
1978].  The  example  is  again  the  star-tracker  estimator  presented  in 
Appendix  C2.  using  linear  transformations,  Powell,  et  al ,  reduced 
the  fourth-order  system  to  a  system  of  second  order.  Their  results, 
compared  to  those  derived  above,  are  presented  in  Figures  5.7  and  5.8  . 

We  note  that  for  the  case  where  measurement  noise  dominates  process 
disturbances  (small  Q/R  ratio),  the  design  is  insensitive  to  the 


ESTIMATOR  ERROR  COVARIANCE 


Comparison  of  covariance  terms  for  augmented 


122 


n)  <U  xi 
£  E  n) 
•H  O  w 

won 
n  4)  c 
iu  o  a 


<L> 

LO 

T3 

• 

P 

LO 

P) 

<L> 

U 

u 

O 

p 

4h 

bO 

•H 

V> 

U* 

p 

•H  4-> 

s 

rt  P 

o 

O  0) 

p 

£ 

<4H 

h  <L> 

O  P 

■P 

P  P 

i-H 

ct3  vi 

6  rt 

V) 

•H  <D 

Q> 

O' 

VI 

UJ  XJ 

i-H 

<D 

05 

6 

C  ctf 

•H 

•H  f-H 

+-> 

n  <y 

P< 

3  Q 

c 

0  0 


a  > 

<N 


i|t  jojj3  gu3Uiaj:tjiSB9W  jets  SK8  / 

x'  d  siBuiTjsg  utaa  swy  =  SW8I, 


Figure  5.8  ESTIMATOR  ERROR  COVARIANCE,  from  Powell  [1978]. 
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estimator  pole  placements  considered;  all  solutions  have  comparable 
performance,  measured  in  terms  of  first-state  error  covariance. 

We  would  suspect  from  our  earlier  work  that  this  design  only  approxi¬ 
mates  the  best  linear  estimator  for  this  case.  This  question  is  examined 
in  Appendix  Bl,  where  we  show  that  the  proffered  "equivalent"  second-order 
system  has  sequentially  correlated  noise,  both  disturbance  and  measurement. 
For  small  Q/R  ratios,  this  correlation  can  be  neglected,  as  shown. 

Engineering  Judgment 

It  is  clearly  important  to  identify  whether  the  problem  under 
consideration  is  sensitive  to  variations  in  the  parameters,  as  is  the 
case  in  Figure  5.3  when  Q/R  >  1,  or  insensitive  and  computationally 
robust.  In  the  latter  case,  the  increased  complexity  often  isn't 
justified  in  terms  of  the  expected  improvement. 
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V. 3  Repeated  Closed- Loop  Eigenvalues 

System  designs  which  lead  to  repeated  closed  loop  poles  also 
serve  to  differentiate  algorithms.  As  we  noted  earlier,  repeated 
eigenvalues  lead  to  difficulties  in  computing  eigenvectors. 


These  problems  arise,  for  example,  in  satellite  control  contexts 
[Bryson,  Feb.  1978],  A  conservative  system  experiencing  rotation  can 
lead  to  a  zero  configuration  of  the  form: 

I 

A  m 

o  o 

- >  Re 

o  o 


When  we  consider  the  design  of  a  controller  for  this  system,  using 
a  quadratic  performance  index,  we  recognize  that  the  non-minimum  phase 
zeroes  will  effectively  be  mapped  onto  their  real-conjugate  counter¬ 
parts  . 

If  we  now  weight  the  states  in  the  quadratic  performance  index 


N 

I  T  T 

J  =  2r  E  X  .  A  x  .  +  u  *  Bu . 
N  i=i  1  1  ii 


much  more  heavily  than  the  control,  we  find  we  have  repeated  eigenvalues 
(for  a  sample  root-locus,  see  Figure  5.9).  This  would  happen  if,  for 
example,  the  control,  u  ,  was  unbounded  (B=0)  but  we  desired  little 


variation  in  the  states. 


Imaginary 


CONSERVATIVE  SYSTEM  ESTIMATOR 
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We  note  that  for  the  example  of  Figure  5.9,  a  ratio  of  A/B  of 
greater  than  10^  yields  nearly  repeated  eigenvalues.  For  a  ratio 
of  A/B  of  greater  than  10^  ,  the  version  of  eigenvalue  decomposition 
we  have  implemented  fails. 

Square  root  doubling  also  fails  in  approximately  the  same  region. 
This  happens  because  B  going  to  zero  in  the  control  context  is 
equivalent  to  the  measurement  noise  covariance,  R  ,  going  to  zero  in 
the  estimation  context.  To  handle  the  case  of  B  or  R  equal  to  zero, 
both  algorithms  must  be  reformulated. 

The  repeated-root  example  also  arises  in  less  extreme  examples. 

In  the  case  of  repeated  roots  undisturbed  by  process  noise,  the  roots 
are  not  moved  by  the  Kalman  gains;  they  remain  repeated.  Also,  the 
roots  of  the  closed-loop  system  may  be  moved  so  that  they  happen  to 
coincide . 

In  these  cases,  with  non-zero  measurement  noise  covariance,  the 
square  root  algorithms  work;  the  eigenvalue  decomposition  algorithm 
would  have  to  be  reformulated.  Both  cases  seem  to  be  pathological. 

In  the  case  of  repeated  roots  undisturbed  by  process  noise,  the  states 
are  not  controllable  through  T  ,  and  the  Kalman  gains  are  zero.  In 
the  latter  case,  the  conjunction  of  closed-loop  roots  resulting  in  a 
Hamiltonian  without  a  full  set  of  eigenvectors  seems  rare.  Although 
we  can  conjecture  possible  problems  which  would  have  these  character¬ 
istics,  we  have  yet  to  find  an  actual  industrial  example. 


Chapter  VI 
EMPIRICAL  RESULTS 
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In  the  last  chapter  we  discussed  sets  of  problems  which  could  be 
solved  by  some  algorithms,  but  not  by  others.  In  this  chapter,  we  shall 
examine  problems  that  can  be  solved  by  all  algorithms,  but  at  varying 
cost  and  with  varying  performance. 

Cost  is  determined  primarily  by  two  factors:  the  computer  resources 
allocated  to  the  task,  and  the  amount  of  time  these  resources  are 
required.  For  the  algorithms  presented  in  Chapter  III,  the  primary 
computer  resource  will  be  main  processor  data  memory.  For  processors 
with  separated  instruction  and  data  space,  but  with  very  expensive 
instruction  memory  (e.g..  Floating  Points  Systems  AP-120B),  program 
size  will  be  an  issue.  For  processors  with  combined  memories  (e.g., 
Digital  Equipment  Corporation  PDP-11/34) ,  program  si  ze  wi  1 1  also  be  important 
for  small  problems  (when  the  program' s  overhead  is  relatively  large). 

For  solving  large  problems,  those  which  will  not  fit  in  central 
memory,  auxiliary  memory  becomes  an  important  resource.  Also  resource 
demands  on  the  central  processor  could  theoretically  be  a  primary 
resource.  However,  no  current  computer  has  a  significant  capacity  for 
allocating  part  of  its  processing  unit  to  one  problem,  while  simul¬ 
taneously  applying  other  computational  resources  to  other  problems. 

The  questions  of  large  problems  and  special  processors  will  be  relegated 
to  Chapter  IX. 

Cost,  for  the  purposes  of  this  study,  will  be  measured  in  terms 
of  memory  required  and  elapsed  execution  time. 

Performance  will  be  measured  in  terms  of  solution  accuracy.  For 
a  given  problem  and  algorithm,  how  much  accuracy  is  attained  given  a 
maximum  machine  precision. 
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VI . 2  Memory  Requirements 

Table  6.1  gives  the  memory  requirements  for  the  algorithms 
considered.  The  full  input  data  set,  consisting  of  {<f),r,H,Q,R}  , 
is  assumed  to  be  external  and. not  available  to  the  algorithm.  Similarly, 
the  output  arrays  consisting  of  the  error  covariance  matrix,  P  ,  and 
the  filter  gains,  K  ,  are  considered  to  be  separate  from  the  algorithm 
work  space. 

For  the  work  space  estimates,  the  table  presents  both  theoretical 
minima  and  requirements  for  algorithms  as  currently  implemented.  The 
minima  are  not  necessarily  practically  attainable.  Issues  such  as  data 
shuffling  overhead  and  addressing  complexity  were  ignored.  These 
considerations  may  make  this  limit  unreachable. 

The  minima  for  eigenvector  decomposition  assume  the  QR  algorithm 

operates  on  the  Hamiltonian  in  place.  The  square  root  doubling  minima 

T/2  h 

assumes  four  triangular  matrices  need  to  be  stored  (W  ,P  ,T^  and 
T9) ,  two  full  matrices  are  required  (for  T21  and  for  <J)q)  and  that  one 
full -order  temporary  matrix  is  required,  to  hold  partial  results. 

Most  of  the  algorithms  were  implemented  optimized  for  speed.  The 
square-root  doubling  algorithms  were  written  to  minimize  data  access 
time.  The  eigenvector  decomposition  routine,  in  contrast,  was  originally 
coded  for  an  IBM  370.  Presumably,  memory  was  not  considered  a  valuable 
resource. 

All  algorithms  were  coded  using  double-precision  arithmetic  (six¬ 
teen  decimal  digits  of  precision).  If  the  square  root  algorithms  perform 
adequately  with  reduced  precision,  then  the  doubling  algorithms  would 
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Table  6.1 

MAIN  MEMORY  STORAGE  REQUIREMENTS 


ASSUMING 


m,  p«n 

m,  p=n 

Implemented 

2 

Input  and  Output  Storage  2n  40 (n) 

7n2 

7n2 

Full  iteration  3n2+0(n) 

CM 

C 

00 

1  2 

Square  root  iteration  n  +0(n) 

6n2 

2  2 

Eigenvector  Decomposition  4n  +0(n) 

4n2+0(n) 

56n2 

1  2 

Square  Root  Doubling  5n 

5n2 

17n2 

1  2 

Bilinear  Square  Root  Doubling  5n 

5n2 

17n2 

Notes: 

1:  All  data  assumed  stored  at  full  precision 
2:  Assuming  the  standard,  IMSL  QR  algorithm 


n  =  number  of  states 
m  =  number  of  input  disturbances 

p  =  number  of  measurements 
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only  require  half  the  storage.  Similarly,  if  double  precision  computa¬ 
tions  are  overly  conservative  for  eigenvector  decomposition,  the  memory 
requirement  could  be  lowered  for  them  also  (see  performance.  Section 
VI. 4) . 

For  problems  with  many  inputs  and  many  outputs,  all  algorithms 
require  approximately  the  same  data  memory.  For  the  single-input  single¬ 
output  problems,  square  root  iteration  is  preferrable.  An  important 
observation  is  that  these  results  are  strongly  effected  by  the  storage 
required  for  the  input  and  the  output  of  data.  In  many  practical  cases, 
the  variations  among  algorithms  would  be  unimportant.  Also,  if  the 
input  data  can  be  overwritten,  then  the  square  root  and  eigenvector 
decomposition  algorithms  may  require  no  additional  storage. 

VI . 3  Computation  Time 

The  two  fastest  algorithms,  eigenvector  decomposition  (EVD)  and 
square-root  doubling  (SQD) ,  were  compared  to  determine  relative  speed. 

The  eigenvector  decomposition  routine  was  taken  from  a  standard  package 
designed  to  handle  many  facets  of  the  design  problem —  designing  discrete 
controllers  and  estimators  for  a  continuous  plant,  determining  observ¬ 
ability,  etc.  For  the  purposes  of  this  comparison,  this  EVD  program 
was  liberally  ravaged  to  minimize  excess  computations.  However,  it  was 
inexpertly  coded  (see  the  previous  section)  and  mere  pruning  failed  to 
optimize  the  code.  The  fundamental  algorithms--  the  Householder  and 
QR  algorithms--  were  part  of  an  industry  standard  package.  Performance 
is  therefore  assumed  to  be  near  optimum. 


131 


In  contrast,  square  root  doubling  was  implemented  for 
optimal  performance  on  a  small  machine.  Separate  facets  of  the  design 
problem  are  solved  by  distinct  programs  (see  Appendix  Al).  Each 
implementation  is  within  a  factor  of  two  of  optimal  performance  for 
the  algorithm  as  specified,  not  including  data  entry. 

The  target  machine  was  a  PDP-11  minicomputer  with  a  thirty-two 
thousand  word  address  space.  Since  the  EVD  algorithm  was  originally 
targeted  for  an  IBM  370,  with  a  relatively  unlimited  address  space, 
the  transplant  severely  limited  the  size  of  problem  that  could  be  solved. 
For  this  study,  the  SQD  program  was  limited  to  twelfth  order  problems, 
and  the  EVD  program  was  limited  to  sixth  order  problems. 

Figure  6.1  presents  preliminary  time  trials.  System  order  is 
plotted  against  relative  elapsed  execution  time,  given  by 

(SQD  time)  -  (EVD  time) 

(EVD  time) 

Thus,  positive  increments  result  when  eigenvector  decomposition  is  the 
faster  algorithm.  The  EVD  algorithm  appears  faster  in  these  results. 

This  data  was  collected  on  a  PDP-11/34  using  a  relatively  fast 
hardware  floating  point  processor  and  a  threaded-code  Fortran  compiler. 
(The  compiler  is  basically  interpretive.)  This  minicomputer  is  medium 
speed,  and  uses  a  memory  well  matched  to  the  speed  of  the  central 
processor. 

Figure  6.2  presents  the  next  time  trials.  It  consists  of  twenty- 
two  problems,  and  includes  the  results  from  the  data  presented  in 
Figure  6.1 .  These  results  were  collected  on  a  PDP-11/70  using  a  Fortran 
compiler  that  generated  pure  executable  code.  Notice  the  different 
trend. 


Ratio : 


System:  PDP  11/34  with  Princeton  Fortran 


Figure  6.1  RELATIVE  ELAPSED  COMPUTATION  TIME  -  TRIAL  I 


System:  PDP  11/70  with  Optimized  Fortran  Four  Plus  Compiler 


Figure  6.2  RELATIVE  ELAPSED  COMPUTATION  TIME  -  TRIAL  II 
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In  this  second  trial  the  square-root  doubling  algorithm  is 
consistently  faster.  Although  the  algorithms  are  identical  in  both 
cases,  the  environments  have  changed,  albeit  subtly.  The  change  in 
Fortran  compilers  has  altered  the  relative  execution  times  of  the 
program  steps.  In  the  central  processor,  the  computation-time  mix 
between  data  addressing  and  floating  point  instructions  has  changed. 
Also,  the  second  computer  uses  a  buffered  (cached)  memory  system  that 
can  significantly  alter  memory  dynamics. 

Figures  6.3,  6.4,  and  6.5  present  the  final  results  for  time 
trials,  run  on  an  IBM  370/168  using  the  Fortran  H  optimizing  compiler; 
both  programs  were  extended  to  handle  problems  of  up  to  twentieth  order. 
The  square  root  doubling  algorithm  was  also  rewritten  in  pure  Fortran, 
increasing  its  execution  time  by  approximately  80%. 

Both  algorithms  give  results  qualitatively  comparable  to  the  PDP11 
for  systems  of  order  less  than  six,  with  the  square  root  doubling 
algorithm  typically  slightly  better.  Above  tenth-order,  however, 
the  Eigenvector  decomposition  algorithm  is  markedly  superior,  out¬ 
performing  the  square  root  algorithm  by  a  factor  of  two  to  six. 

3 

It  is  interesting  to  include  a  curve  of  0(n  )  on  Figures  6.3  and 
6.4,  normalized  to  the  same  value  on  each  graph  for  n  =  5.  We  note 
that  the  square  root  doubling  trials  are  predominantly  above  the  0(n3) 
curve;  this  result  is  plausible  since  the  doubling  algorithm  requires 
O(n^)  computations  per  iteration,  with  the  total  number  of  iterations 
typically  increasing  with  system  order.  In  contrast,  the  time  trials 

3 

for  the  EVD  algorithm  are  below  the  0(n  )  curve,  as  we  anticipated  in 
Chapter  II. 

An  interesting  anomaly  can  be  found  in  the  one  eigth-order  system 
where  the  square  root  doubling  algorithm  was  faster  than  the  EVD  trial. 


Figure  6.3  ELAPSED  COMPUTATION  TIME  FOR  SQUARE  ROOT  DOUBLING 
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Figure  6.5  RELATIVE  ELAPSED  COMPUTATION  TIME  -  TRIAL  III 
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This  was  the  only  data  point  for  which  the  closed- loop  poles  of  the 
system  were  all  of  magnitude  less  than  or  equal  to  0.1  in  the  Z-plane. 

Figure  6.6  shows  examples  of  the  convergence  behavior  for  the 
doubling  algorithm,  which  behave  as  expected;  convergence  occurred 
when  the  difference  between  the  trace  of  the  covariance  matrix  on  two 
successive  iterations  was  less  than  the  machine's  epsilon.  The  error 
is  fairly  constant  for  most  of  the  computation,  and  then  drops  off 
very  rapidly. 

These  results  are  partially  conclusive.  For  small  problems, 

the  difference  between  algorithms  is  slight,  and  for  large  problems 

the  EVD  algorithm  is  consistently  faster  for  almost  all  problems. 

Alternate  formulations  of  the  SQD  need  to  be  considered  (for  example, 

-T 

calculating  T^  explicitly  instead  of  inverting  T^  ) ,  although  these 
are  not  likely  to  change  the  overall  conclusion. 

Another  valuable  lesson  has  been  learned:  that  two  very 
similar  computers  can  give  qualitively  different  results.  The  inter¬ 
action  between  algorithms  and  their  host  computers,  especially  the 
memory  dynamics,  has  not  been  properly  appreciated. 
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VI . 4  Performance 

For  all  of  the  test  problems  considered  in  the  previous  section, 
square-root  doubling  and  eigenvector  decomposition  gave  identical  results 
to  at  least  one  part  in  10^.  All  computations  were  in  double  precision 
(sixteen  decimal  digits  of  accuracy) . 

A  question  still  to  be  answered  concerns  the  precision  required 
to  accurately  and  stabily  determine  the  steady-state  Riccati  solution. 

As  demonstrated  in  Chapter  III,  square  root  algorithms  often  require 
only  half  the  precision  required  for  full  iteration,  but  these  SQD 
algorithms  may  require  full  precision  in  some  cases.  Do  these 
full -precision  problems  occur  in  practice,  or  is  this  upper  bound 
irrelevant  and  overly  loose? 

Second,  how  does  the  required  minimum  floating-point  precision 
compare  for  various  algorithms?  This  question  is  especially  important 
because  all  algorithms  are  implemented  using  double-precision  computations 
Double-precision  arithmetic  requires  twice  as  much  storage  and  up  to  four 
times  more  execution  time  than  single-precision  calculations.  A 
reduction  in  precision  is  significant  both  in  time  and  in  memory. 

The  efficient  answer  to  these  problems  required  modifying  the 
Fortran  compiler.  This  has  not  yet  been  completed. 


VI . 5  Convergence  Sensitivity 

Chapter  Four  considered  solving  the  continuous  steady-state 
Riccati  equation  using  square-root  doubling.  One  approach  required 
converting  the  continuous  problem  to  a  discrete-time  problem  by  an 
application  of  the  bilinear  transform  to  the  original  Riccati  equation. 
The  important  quantity  in  the  mapping  was  the  approximation  to  the  exact 
discrete  transition  matrix,  given  by: 
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(ctl+F)  (ctl-F)'1  (5.1) 

where  F  was  the  continuous  state  dynamics  matrix.  We  noted  that 
alpha  should  not  coincide  with  an  eigenvalue  of  F  and  should  be 
positive.  No  other  criterion  was  specified. 

Figures  6.7  and  6.8  show  the  dependence  of  elapsed  time  on  the 
choice  of  alpha  for  a  representative  sample  of  problems.  Figure  6.7 
plots  elapsed  time  versus  alpha;  Figure  6.7  plots  relative  elapsed 
time  versus  alpha  (centered  about  the  smallest  alpha  giving  minimum  elapsed 
time) .  These  results  for  square  root  doubling  are  consistent  with  the 
results  found  by  Hitz  [1972]  for  iteration  of  the  mapped,  discrete-time 
Riccati  equation. 

The  computation  converges  quickest  for  choices  of  alpha  near  unity. 

If  alpha  becomes  too  large  or  too  small,  the  mapping  (5.1)  tends  to 

the  identity  matrix,  modulo  the  sign.  When  this  happens,  the  difference 

T 

between  the  discrete-time  quantity  <f>DP<|>D  and  P  becomes  very  small. 
Similar  problems  arise  for  other  mapped  terms  in  the  equation  (see 
Chapter  IV,  equations  (3.1)  through  (3.3)). 

This  problem  can  be  severe.  In  example  (5)  of  Figure  6.8,  any 
choice  of  alpha  less  than  0.014  caused  a  divergent  solution. 

Another  complication  arises  if  F  has  a  positive  eigenvalue  near 
alpha.  For  example,  result  (3)  has  an  eigenvalue  at  .004  and  a  complex 
pair  at  .025  ±  0.64j.  If  alpha  is  near  an  eigenvalue,  then  some  entries 
of  (aI-F)~*  will  be  very  large.  The  iteration  calculation  leads  to 
taking  the  difference  of  two  very  large  and  almost  equal  numbers,  with 
concommitant  failures  in  convergence. 


Figure  6.8  BILINEAR  SQUARE  ROOT  DOUBLING  CONVERGENCE 


In  Chapter  IV  we  noted  that  the  bilinear  approach  was  basically 
a  two  step  procedure;  first  the  transform  was  taken,  then  the  iterations 
were  performed.  A  question  was  raised  concerning  the  significance  of  the 
added  overhead  of  this  initial  step.  However,  for  systems  above  fifth- 
order,  the  time  required  to  perform  the  bilinear  transform  does  not 
appear  to  be  a  significant  part  of  the  elapsed  time  for  the  computation. 

VI  .6  Conclusions 

Eigenvector  decomposition  and  square  root  doubling  require 
comparable  elapsed  computation  time  for  problems  smaller  than  sixth- 
order.  For  larger  systems,  the  QR  algorithm  was  faster. 

At  least  for  the  smaller  problems,  speed  is  not  an  important 
decision  criterion.  This  is  true  both  because  the  two  algorithms 
are  comparable  in  speed,  and  because  the  total  elapsed  time  was 
negligible  for  these  applications;  the  sixth-order  problems  always 
required  less  than  two  seconds  to  solve.  Memory  requirements  are  also 
of  little  importance,  judging  from  conjectured  minimum  memory  specifica¬ 
tions;  this  is  certainly  true  if  the  input  data  can  be  overwritten. 

Other  criteria  emerge  as  being  more  important,  including 

1)  Accuracy, 

2)  General  applicability, 

3)  Adaptability  to  and  synergy  with  the 
computing  hardware. 

The  bilinear  transform  was  quite  sensitive  to  the  mapping  constant, 
alpha.  Minimum  iterations  occurred  for  alpha  near  unity.  No  direct 
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I  relationship  between  alpha  and  the  dynamics  matrix,  F  ,  was  established. 

Hitz  [1972]  suggested  that  alpha  should  equal  the  average  of  the  moduli 
of  the  eigenvalues  of  F  .  IVe  have  insufficient  examples  to  substantiate 
J  this  conjecture.  However,  departing  from  the  optimal  choice  for  alpha 

r 

I"  by  several  orders  of  magnitude  commonly  leads  to  the  divergence  of  the 

doubling  solution. 

I  The  computational  overhead  of  the  bilinear  transform  was  negligible 

for  the  sixth-order  systems  examined. 

|  In  the  next  chapter  the  topic  changes.  The  discussion  turns  to 

r 

j  considering  the  design  of  optimal  compensation  given  qualified  uncertainty 

i 

■  in  the  parameters  of  the  system  model. 

! 
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Chapter  VII 

PARAMETER  SENSITIVITY  IN  DISCRETE  SYSTEMS 


In  many  optimal  control  applications,  whore  state-estimators 
rather  than  directly  measured  states  are  used  for  feedback,  the  closed 
loop  system  is  excessively  sensitive  to  variations  in  parameters 
[Berger,  1973].  Early  work  focused  on  dynamic  desensitization; 
Bar-Shalom  and  Sivan  [Bar-Shalom,  1969]  augmented  the  estimator  state 
to  include  the  unknown  parameters,  and  developed  a  suboptimal  scheme 
for  solving  this  problem.  Berger  [Berger,  1973]  assumed  constant  but  un 
known  plant  parameters,  with  a  known  parameter  probability  distribution 
he  solved  the  continuous  controller  problem,  assuming  rectangular 
probability  distributions.  (  His  solution  was  optimal.) 

Palsson  and  Whittacker  [Palsson,  1972]  solved  the  continuous  single 
input-single  output  case  assuming  a  Gaussian  distribution  for  the  system 
parameters.  Hadass  [Hadass,  1974]  solved  the  same  problem  for  multi¬ 
input,  multi-output  systems. 

In  this  chapter  we  present  the  optimal  solution  for  the  discrete 
multi-input,  multi-output  system  assuming  uncertain  constant  parameters 
whose  uncertainty  can  be  described  by  a  Gaussian  distribution.  We  will 
also  comment  upon  the  efficacy  of  this  approach  in  terms  of  Bryson's 
"oblivious"  filter  [Bryson,  1977]. 


MHI 


NMM 


«k 


■MMHWMl 
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VII. 2  Chapter  Outline 

We  begin  by  considering  the  uncertain  parameters  of  the  system 
as  random  variables.  A  performance  index  will  then  be  defined  as  the 
weighted  sum  of  the  steady-state  control  and  state  covariances,  which 
in  turn  are  a  function  of  random  system  disturbances  and  the  variances 
of  the  uncertain  parameters.  This  performance- index  will  then  be 
minimized  by  adjusting  a  set  of  free-parameters .  These  gains  are 
normally  the  Kalman  estimator  gains  and  the  controller  feedback  gains. 
The  appropriate  equations  for  desensitizing  discrete  systems  are  then 
developed. 

With  the  theory  established,  an  algorithm  was  developed  and  im¬ 
plemented.  Several  practical  examples  were  studied,  including  the 
problem  of  desensitizing  a  marginally  stable  high  performance  aircraft. 
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VII . 3  Sensitivity  Equations  for  Discrete  Systems 

We  begin  by  assuming  we  can  describe  tho  physical  system  as  a  set 
of  states,  x^,  governed  by  linear  vector  difference  equations: 


Vl  =  *1  xk  +  ril  Uk  +  r21  wk 


(7.1) 


=  H,  x,  +  v 


4  =  H1  Xk 


These  equations  have  two  sets  of  inputs--  a  deterministic  input,  u^  , 
and  stochastic  inputs,  w^  and  v^  .  These  stochastic  inputs  are 
assumed  to  be  Gaussian  random  variables,  determined  completely  by  their 
mean  and  variance.  We  shall  assume: 


rw,i 

r  T  Ti 
lw.  v. ]  = 

'Q 

o' 

H 

> 

1  1 

0 

R 

L  XJ 

L  J 

6.  . 

il 


f(vi)  =  6(w.) 


S(xo)  =  S(x0  v.) 


f(x0  w!)  =  0  (7.2) 


g(x  X  )  =  P 
v  o  o'  o 


Bryson  [Bryson,  1975]  has  shown  that  an  interesting  class  of  solutions 
arises  from  minimizing  the  quadratic  performance  index: 


N-l 

J  =  lim  g  Y.  (xT  A  x.  +  uT  Bu.  )  , 

..  N  .  v  l  l  li 

N-*»  i=o 


u. 

l 


-C  x. 

l 


x.  =  x.  +  K(y . -H  x. ) 

l  l  J  l  l 


xi+i  =  ♦  xi  +  r  u. 


(7.5) 


(7.4) 


where 
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and  we  choose  C  and  K  to  minimize  J  .  (We  note  in  passing  that 


if  we  select 


!>1  ,  H  =  ,  and  F  =  then  is  the  maximum 


likelihood  estimate  of  x^  as  originally  outlined  by  Kalman.)  The 
design  procedure,  then,  is  to  choose  K  to  give  the  maximum  likelihood 
estimate,  x  and  to  choose  C  to  minimize  J.  The  closed  loop 
performance  of  the  system  can  then  be  varied  parametrically  through 
the  weighting  matrices  A  and  B  . 

lo  facilitate  handling  this  system,  we  begin  by  writing 
the  augmented  state  equations: 


'  *+ii =  h 


k+ij  [KVi 


-F  C 
11 


(I-KH)  (<f>-FnC)  -  KHirnC  xR 


0  |  fw 


miV21  K  Vl 


(7.5) 


1  T  T 

J  =  lim  t?  I  f  [x.  (AC IBC)x.  ] 

N-hxi  i=o  1  1 


which  we  shall  henceforth  write  as 


=  4>  v,  +  rw 


*k+l  =  *  \ 


J  =  lim  i-  E  &[x!(A+CTBC)x.] 
N-*»  i=0 


*  Wk  =  I  \ 


A  =  Fa  o 


o  o 


c  =  [o  j  -cj 
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Introduction  of  Uncertain  Parameters 

We  assume  $  and  T  are  constant  coefficients,  but  that  they 
are  derived  from  physical  parameters  of  uncertain  value.  We  propose 
to  model  these  physical  parameters  as  Gaussian  random  variables  of 
known  mean  and  variance. 

We  usually  assume  that  all  physical  constants  are  known  exactly  -- 
the  frequency,  w,  is  precisely  25  radians/sec. 

+  Probability  of  oj 

-  ■»  oj  (rad/sec) 

25 

An  alternate  assumption  would  arise  if  we  expect  the  frequency,  w, 
to  be  about  25  rad/sec,  but  we  are  95%  certain  w  lies  within  +  5 
radians  of  the  expected  value.  Modelling  u  as  a  Gaussian  distri¬ 
bution,  then  95%  certainty  encompasses  about  2  standard  deviations, 
so  the  variance  of  w  would  be  about  6,  and  we  would  have 

f.(u))  =  25  rad/sec 

-  -  2 
f. (w-to)  (to-w)  =  6(rad/ sec) 

to(rad/sec) 

Note  again  that  we  assume  w  remains  constant  over  time,  but  that  we 
are  uncertain  as  to  its  exact  value. 

We  shall  henceforth  describe  these  uncertain  parameters,  \(j  ,  with 
a  prescribed  and  constant  normal  distribution: 
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<J>  =  N(*,V) 

X|c+1  =  *(<|0xk  +  T(WM(k)  (7.7) 

N-i 

J  =  lim  J-  l  f-  Ix-CA+C'bOx.]  . 

N-+CO  i=o 

We  now  will  be  taking  the  expected  value  of  over  the  random 

processes  w^  and  v  ^  +  ^  ,  and  over  the  randon.  vector  ip  . 

Since,  for  any  vector  v,  w>  and  matrix  A  we  can  write: 

T  T 

v  w  -  trace  (wv  ) 

T  T 

v  (Aw)  =  trace  [(Aw)v  ] 


we  have 

N-l 

J  =  trace  {[A  +  CTBC]  lim  i-  £  sCXjxJ)}  (7.8) 

N-«°  i=o 

T 

If  we  can  find  an  expression  for  f(X^X^)  >  then  we  can  minimize  J 
by  appropriate  choice  of  C,  K,  cj>,  H,  and  f  . 

The  uncertain  parameter  vector,  ip  ,  entered  in  a  non-linear 
fashion,  so  we  begin  by  linearizing  the  equation  for  X],  >  (7. 6),  about 

the  nominal  value  for  i|»  . 

x(n,<M  =  xnom(n,^)  +  <$x(n,<ty) 

where  n  is  the  random  process  composed  of  w  and  v,  Xn°m  is  the 
state  vector  obtained  when  the  parameters  have  their  nominal  values, 
and  dX  is  the  perturbation  in  the  state  vector  due  to  a  perturbation, 
6 ip  ,  in  the  uncertain  parameters.  Our  goal  is  to  write  state  equations 
in  terms  of  Xn0m  and  <5X  .  Assuming  small  perturbations  in  \p  ,  so 
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that  a  first  order  expansion  is  satisfactory,  we  have 


xk+l^n’^  +  '5X)C+1  =  $(4+<$4)  [Xk°m(n,4)  +  6xk(n»5Wl  +  r(4+64)Mk 


Linearizing  $  by  taking  a  Taylor  Series  expansion,  we  get 


$(4+64)  =  (4+64-4)  +  R2(&W 


Neglecting  the  higher  order  terms,  i.e.,  assuming  R?  w,  0,  we  have 


4(4+64)  =  4(4)  +  64(64)  ,  64(64)  =  E  64. 

9vi  4=4 


r(4+6ijj)  =  r(ij))  +  <sr(54) 


CT  +  6xk+i  =  (*+,s$Kxkora  +  «xk)  -  (r+br)N1 


=  ($Xk0m  +  rNk)  +  (6$xk0m  +  $6yk+  5rNR)  +  (646Xr)  . 


The  last  term,  to  first  order,  is  again  neglected.  Separating  by 


independent  variables,  we  get 


x£>,4)  =  $(4)xnk°m(n,4)  +  r(4)Wk  ,  xJ°m  =  o  (7.ioa) 


<sXk+1(n,<S4)  =  '5$(64)xk°m(n,4)  +  $(4)6xk(n,64)  +  6T(64)Wk  ,  6x0  =  0 


(7.10b) 


With  x  and  6X  expressed  at  linear  functions  of  p  and  4  , 

T 

we  now  consider  evaluating  5X-X^ 


T  T 

_  T  _  nom  nom  „  nom  .  T  c  t  nom  „  .  x  T 

4XiXi  =  Sx  X  +  4  X  <5x+4<$xx  +  &5X  «X 

n,4 


Examining  the  second  term,  we  find 


^,[x(n>^)<Sxr(n,>5<i') ]=  S txCn)  e  <$x 1  (n,s<fO  ] 

n  ip 


f  c5xk  +  1  =  ^  ([54>Xk0nl  +  4»6xk  +  6rwk) 


=  ||  r.(6i^xnom)  +  <t>  £  6Xk  +  U  V 


f  (Sip)  =  t'(6'pxJ  -  feC'S^n)  =  0  ,  since  n  and  ip  are  independent. 


Thus ,  t "5 x(l , 

n 


,  [xn°m6xT]  =  0  =  8 [6XXn°m 


Further 


_  nom  .  f.  nom  pi/  ^  _  n 

\*\ =<  Wxk  *  rv  ■ 0 


n-  —  _  ’p 

IVc  have  that  r  =  f{[x^“X^J  [x^“  x^J  }  so  we  can  define 


covariance  matrices-- 


t  _  nom  nom  r.  ^  t  v  l  x v 

>'  X.X1  =  t  Xi  Xi  +  f  ^XifiX  -  li  + 

n,4»  1  1 


where  X^  is  the  state  covariance  due  to  random  disturbances,  and 
6X.  is  the  additional  state  covariance  arising  from  the  uncertainty 

— l 

in  the  parameters . 

If  we  assume  a  stable  system,  then  the  state  covariance  converges 
to  a  steady-state  value,  and  we  can  re-write  the  performance  index  as 


J  =  trace  [ (A  +C  BC) [ X+fX J J 


where  X  and  §X_  are  the  appropriate  steady  state  covariance 
matrices. 

X  follows  directly  from  multiplying  the  governing  equation  for 
ynom  by  transpose  and  taking  the  expected  value;  the  result  is 
the  well-known  Lyupunov  Equation: 


$  x  fT  -  x  =  -  r  0  rT 


0  =  &  wwT 


Q  o 

0  R 


Similarly,  we  have 

6yk+i<i  =  (<$4>X  +  +  «rWj  (xT6<f>T+6xT<!)T+W  6r  ) 

=  s^xx^^  +  +  srsisi^sr^ 

+  6(}>xi5x^(})T  + 

+  4>6xxT|54>T  +  4>6xWT6FT 
+  srwx'V  +  <5rM($xT<5T 


Taking  expected  values  of  both  sides,  many  terms  are  zero 
yielding 

<i>T  +  g  [6$X,  6$T  +  6r©6rT] 

*  ■* 

+  &  [61>x,  5xJ<t>T  +  ^Xi-X^1] 
n,<J>  k  k 

to  solve  this  equation,  we  introduce  the  intermediate  variable 


“k.i  ■  4  *2k 


nom  ,  T 

:  «  xt 
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so  that 


$  5X_  $T  -  6X  =-t'  [6$X6<tT+6rO(5rT+6(}>Y!fT+(}>YT6(})T] 


To  solve  for  Y  ,  we  again  write  the  equations  for  x  and  <$X  and 
take  expected  values  of  both  sides  w.r.t.  q  to  get 

<J>Y$T  _  y  =  -  <J>X"6$T  -  T06rT 


Assuming 
the  solution 
Choose 


=  ^  (ip-ip) A  is  a  diagonal  matrix,  we  can  now  write 


as : 


K,  C,  <j>,  T,  and  H  to  minimize  J,  where 


<t>x<i>T  -  x  =  -ror1 


<j)Y4)T  -  y  =  -$x<5$T  -  resrT 


<p6X4>T  -  5X  =  -Z 


3<p  — 


ltmT  *  «T(||:)T 


— 11 


J  =  trace [(A+C^  BC) (X+oX) ] 


These  three  equations  can  now  be  solved  sequentially  to  yield  J.  We  can 
then  iterate,  using  a  gradient  search  procedure,  to  find  the  minimal 

value  of  J  . 

The  program  implementing  these  equations  will  be  described  in  a 
separate  technical  report. 


mm 
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VII. 4  Augmented  Observability  and  Controllability 
Choice  of  A,  B,  and  r\ 

Claims  appear  in  the  literature  [Katz,  1974]  to  the  effect  that 
the  result  of  optimizing  a  quadratic  performance  index,  for  example 
N-l  t 

J  =  I  x .  Ax .  +■  u*Bu.  (7.11) 

i=0  1  1  1  1 


always  yields  a  stable  closed-loop  system.  That  this  is  not  true  for 
a  large  class  of  systems  will  be  shown  in  the  examples  which  follow. 

If  A  of  equation  7.11  is  a  symmetric,  non-negative  definite  matrix, 
then  we  can  decompose  A  into  an  mxn  vector  d  satisfying 

A  =  d'd  (7.12) 


where  m  is  the  rank  of  A.  If  we  have  a  system  of  the  form 

x  =  Ax  +  bu 

A  sufficient  condition  for  ensuring  stability  of  the  closed  loop 
system  is  for  the  related  system 


x  =  Fx  +  bu 
y  =  dx 


(7.13) 


to  be  detectable.  This  is  directly  related  to  a  further  result-- 
that  the  performance  index  will  be  finite  as  long  as  the  unstable  modes 
of  (7.13)  are  not  observab le- so  a  finite  performance  index  does  not 
ensure  a  stable  optimal  solution. 

The  dual  result  is  that  the  Kalman  filter  will  be  stable  if  the 


system 
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VI I . 5  Application  to  Parameter  Desensitization 

Using  exactly  analogous  arguments,  we  can  see  immediately  that 
failing  to  meet  the  augmented  observability  and  controllability  condi¬ 
tions  will  have  a  profound  effect  on  the  performance  of  the  desensiti¬ 
zation  algorithms.  Given 

J  =  Z  xT  Ax  +  uTBu 
T  T  T 

We  redefine  x  Ax=x  ddx 
z  =  d  x 


If  a  state,  x,  is  not  observable  through  the  observations  z, 

then  the  uncertainty  of  that  state  will  not  be  observed  in  J. 

So,  the  effect  of  varying  parameters--  such  as  the  gains  K  and  C-- 

will  not  be  seen  in  the  performance  index,  J  .  We  must  also  consider 

the  effect  of  uncertain  states  on  the  estimator.  It  can  be  shown  that 

minimizing  the  aforementioned  performance  index,  without  introducing 

sensitivity  considerations,  will  yield  the  expected  Kalman  filter  as  an 

estimator.  We  therefore  expect  the  efficacy  of  the  desensitization  to 

T  ’3 

be  effected  by  the  controllability  of  the  system  {F,(rQF  )  }. 

For  example,  consider  a  simple,  first-order  system  with  no  process 


noise 


X. 

=  ax,  +  u. 

T 

+  w.  c  w  .  W  . 

1+1 

1  1 

1  1  J 

y . 

II 

rr 

X 

+ 

,  T 

f-  V  .  V  . 

J 1 

1  N-l 

1  J 

J 

=  lim  i  Z 

£-[  ??(A-*CTBC)  xj 

N-*»  i=o 


Clearly,  this  system  is  not  controllable  (via  w)  in  the  augmented 
sense.  If  we  now  synthesize  the  optimal  estimator 


We  can  write  the  augmented  state  equations 


a 

kha 


Consider  the  sensitivity  equations, 

$  x  4>T  =  -rerT 
HJT  -  Y  =  -4>X6$T  -  T06rT 

$  6X  $T  -  6X  =  -u  ■  .  •  ]?■  • 

—  —  *•  -t-ii 

T _ 

J  =  trace[  (A+CBC)  (X^+6X)  ] 

If  we  assume  k  equals  zero,  then  rerT  is  zero.  It  follows  that 
as  long  as  <j>  and  A  are  positive  definite,  X  =  0  solves  the  first 
Lyapunov  equation.  Similarly,  Y  =  0  solves  the  second  Lyapunov  equation, 
and  =  0  solves  the  third!  The  conclusion,  of  course,  is  that 
J=0  ,  for  all  choices  of  c  yielding  a  stable  system  matrix  F  .  And, 
since  J  must  be  non-negative,  we  have  found  an  optimal  choice  for  k  . 

Clearly,  the  limited  control  over  the  closed  loop  poles  (via  choosing 
the  feedback  gain  c)  is  intolerable.  The  important  point  for  the 
moment,  however,  is  that  our  uncertainty  in  the  value  of  the  system  para¬ 
meter  a  has  no  effect  on  our  choice  of  feedback  gains —  k  or  c  -- 
as  long  as  the  system  is  not  controllable  in  the  broad  sense. 
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Turning  once  again  to  the  original  dynamic  equations,  after 
linearizing,  we  had  (7.10) 


nom  ■  nom  p,> 

x.  ,  =  x-  +  1  N 
1  +  1  Y  1 


<$xi+1  =  4>  <$x^  +  <S4>x^  +  <$rW 


Assuming  k=0  ,  we  have 


nom  .  nom 
xi+l  =  **i 


6xi+i  =  4>6xi  +  (S^Xj 


With  stable  dynamics  matrices,  xnom  is  undriven,  and  will  decay  to 
zero.  Therefore,  6x  will  be  undriven,  and  will  also  decay  to  zero. 

As  long  as  x  =  xn°m  +  <$x  is  undriven  by  stochastic  disturbances ,  the 
state  will  decay  to  zero  in  steady  state,  and  be  unaffected  by  (small) 
errors  in  the  dynamics  matrix  (assuming  a  stable  dynamics  matrix). 

This  is  the  same  type  of  result  we  see  with  "oblivious"  filters 
[Bryson,  1978].  With  oblivious  filters,  if  a  mode  is  unexcited  by 
disturbance  noise,  then  the  Kalman  gain  associated  with  that  mode  decays 
to  zero.  In  our  case, we  find  that  if  a  mode  is  undisturbed,  then  it  is 
ignored  in  the  desensitizatioa  procedure! 

Further,  in  the  steady  state,  the  impact  of  parameter  uncertainty 
is  determined  both  by  the  uncertainty  and  by  the  strength  of  the  noise 
driving  the  mode.  Lightly  driven  modes  are  lightly  weighted  in  the 
performance  index. 

This  explains  the  necessity  for  the  scale  factor  Hadass  introduced 
to  boost  the  impact  of  the  parameter  uncertainty,  and  the  necessity  for 
using  unrealistically  large  variances  for  parameters  in  some  practical 
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examples . 

We  can  gain  additional  insight  into  the  nature  of  this  problem  by 
examining  equation  (7.10b) 

6xk+i  =  6<fxk°m  +  *6xk  +  6r\  +  &*&X\  *  6X0  =  ° 

We  assumed  in  the  derivation  that  the  last  term,  to  first  order,  can 
be  neglected.  This  term  provides  the  direct  coupling  from  parameter 
uncertainties  (in  6<ji)  into  state  uncertainty  (in  <$X^) ,  and  hence 
into  the  performance  index,  J.  Therefore,  neglecting  this  term  has 
an  unfortunate  consequence,  especially  if  the  state  is  undisturbed  as 
outlined  above. 

Unfortunately,  with  the  elaboration  of  equation  (7.10b),  the  ex¬ 
pected  value  of  with  respect  to  i(j 

S6*k 

is  no  longer  zero.  The  formulation  no  longer  follows  directly  as 
outlined,  and  an  alternative  formulation  must  be  sought. 


V 
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VII. 6  Applications 

A  computer  program  was  written  to  solve  the  Lyapunov  equations  for 
minimizing  sensitivity  to  parameter  uncertainties.  This  program  can 
handle  both  continuous  and  discrete  systems . 

Program  Verification: 

We  would  expect  discrete  systems  with  relatively  fast  sample  rates 
to  display  parameter  sensitivities  analogous  to  the  corresponding 
continuous  system.  We  therefore  chose  a  continuous  system  originally 
investigated  by  Hadass  [Hadass,  1974].  This  choice  allows  us  to  verify 
both  the  continuous  solution  and  the  discrete  solution  generated  by  the 
computer  program. 

A  description  of  the  model  can  be  found  in  Appendix  C4.  It  was 
basically  a  fourth  order  system,  with  the  governing  equations: 

IjG  =  +  w1 

I2('<£+0)  =  -k<J>  +  u  -  w^  +  w2 

where  the  state  vector  is  XT  =  [9,9,<j>,|],  the  input  is  u,  and  Wj  and 
w2  are  process  noise  sources.  The  output  of  the  system  is  y  =  0  +  v  , 
where  v  is  the  measurement  noise  source.  We  use  the  data  presented 
by  Hadass  on  pages  55  and  56,  except  we  need  to  use  different  noise  power 
spectral  densities  (to  match  his  results).  Specifically: 


Qw  = 


6.6  x  10 


-2 


2.0  x  10 


-2 


[rad^sec  ^ ] 


r  =  3.4  x  10"^  rad^sec 
v 
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We  assume  all  of  the  model  parameters  are  known  exactly,  except  for  the 
spring  stiffness,  k  .  If  we  assume  k  has  a  nominal  value  of  25.0 
and  a  covariance  of  49.,we  would  expect  the  program  would  give  the  results 
presented  by  Hadass  on  pages  99  through  105.  This  was  the  case. 

Next, we  derived  the  discrete  equivalents  of  the  continuous  system, 
using  the  transformation: 

x  =  Fx  +  Gu  +  Fw  (continuous) 

Xn+1  =  ^xn  +  Flu  +  r2w  (discrete) 

y  =  H  x  +  v 

where  we  assumed 

FT 

<(>  =  e  ,  T  is  the  sample  interval 

F^  =  fj  $(t)Gdt  ,  where  $(r)  =  eFx 

^2  %  ^ 

The  fastest  roots  of  the  physical  system  lie  at  5  radians,  so  if  we 
choose  a  sample  interval  of  25  milliseconds,  or  40  samples  per  second, 
we  expect  the  behavior  of  the  continuous  and  discrete  systems  to  be 
nearly  identical. 

The  results  of  an  optimization  run  are  presented  in  Table  7.1  and 
Figure  7.1.  The  results  can  be  compared  along  several  dimensions. 

First,  over  what  range  can  we  vary  the  spring  stiffness  and  still 
retain  stability?  This  is  given  by  the  terms  AK+  and  AK  ,  where 
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Table  7.1 

RESULTS  OF  PROGRAM  VERIFICATION 


iteration 

PARAM 

CONTINUOUS 

DISCRETE 

0 

AK 

_ 

0.344 

0.344 

+ 

0.297 

/ 

0.297 

A 

J 

n 

1.02  x 

104 

1.02  x 

104 

A 

1.19  x 

104 

1.19  x 

104 

T 

2.22  x 

104 

2.22  x 

io4 

OJ 

25.0 

25.0 

3 

AK 

. 

0.625 

+ 

- 

0.828 

A 

J 

n 

1.12  x 

lot 

1.15  x 

10? 

A 

5.85  x 

ioJ 

4.07  x 

1°4 

T 

1.71  x 

104 

1.55  x 

io4 

CJ 

30.2 

25.0 

15 

AK 

.  * 

0.539 

+ 

1.132 

/. 

- 

J 

n 

1.21  x 

103 

1.16  x 

io4 

A 

3.19  x 

104 

3.42  x 

104 

T 

1.53  x 

io4 

1.51  x 

io4 

u> 

26.4 

26.7 

AK  -  relative  change  of  spring 

stiffness  causing 

instability 

relative 

decrease 

+  relative 

increase 

J  -  performance 

index 

n  -  nominal, 

no 

uncertainty 

A  -  addition 

.  due  to  uncertainty 

T  -  sum  of  nominal  and  A 

a)  -  nominal  spring  : 

stiffness 

coefficient 

used  in 

estimator 

values  of  AK  quoted  for  44  iterations 


* 


MM 


mmm 
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Closed-Loop  Pole  Gravitation 


Figure  7.1 


DESENSITIZATION  LOCUS 
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n 


Ak 


k2 


where  k^  and  k?  are  the  coefficients  yielding  marginal  stability 
if  we  increase  or  decrease  k,  respectively.  From  Figure  7.1  the 
results  are  comparable. 

Second,  we  would  expect  the  performance  index  to  closely  match-- 
this  is  an  excellent  check  of  the  two  sets  of  Lyapunov  equations.  We 
have  a  nominal  performance  index,  ,  which  is  generated  without 
considering  parameter  uncertainty;  next,  we  have  the  added  cost 
due  to  uncertainty,  ;  and  finally, we  have  the  total  uncertainty,  . 

We  note  a  good  correspondence  initially,  before  iteration  began;  there 
is  also  a  good  correspondence  between  the  continuous  and  discrete  results 
after  fifteen  iterations--  when  improvement  in  the  index  had  effectively 
stopped. 

The  final  analytic  criterion  for  comparison  involves  the  coefficient 
w;  since  k  was  an  uncertain  parameter,  we  chose  to  optimize  the 
estimator  by  varying  the  Kalman  gains  and  by  varying  the  assumed  value 
of  the  spring  stiffness  in  the  estimator.  The  result  of  the  freedom 
was  an  increase  in  the  assumed  stiffness--  from  25.0  to  over  26.  The 
correspondence  is  again  good  between  the  two  cases. 

Next  we  look  at  the  pole  map  in  Figure  7.1.  This  diagram  plots  the 
relevant  closed  loop  pole  locations  as  a  function  of  program  iterations. 
The  plane  we  plotted  is  the  discrete  or  z-plane;  stable  roots  have  a 
magnitude  of  less  than  1,  lying  within  the  unit  circle.  We  have  plotted 
the  result  of  the  discrete  solution  directly,  and  the  results  of  the 
continuous  solution  using  the  mapping 
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where  z  is  the  discrete  pole  location,  s  is  the  continuous  pole 
location,  and  T  is  the  sample  interval. 

We  begin  by  noting  that  even  when  the  performance  index  has  converged 
to  its  terminal  value--  which  occurs  at  approximately  fifteen  iterations-- 
the  pole  locations  for  this  value  of  the  index  are  only  loosely  fixed. 

In  other  words,  there  are  a  wide  choice  of  filter  gains  and  closed  loop 
pole  locations  which  will  yield  the  same  value  for  the  total  performance 
index.  This  behavior  is  expected  when  using  a  gradient  search  procedure; 
the  final  value  depends  upon  the  starting  conditions  and  the  number  of 
iterations  performed.  The  results,  however,  are  all  qualitatively 
identical . 

We  note  that  the  discrete  solution  is  bracketed  by  the  continuous 
solutions.  We  therefore  conclude  that  all  tests  comparing  the  discrete 
and  continuous  cases  have  yielded  the  expected  results. 

An  Example 

As  a  practical  example  of  ceducing  sensitivity,  we  investigated 
the  test  case  presented  in  Appendix  Cl.  This  example  is  a  hypothetical 
high-performance  aircraft,  henceforth  termed  the  FH,  modelling  the 
longitudinal  short  period  mode  and  the  first  bending  mode.  In  addition, 
wind  gists  were  modelled  as  a  first  order  Gauss  Markov  process.  A 
.  ■cr'.'te  compensator  was  then  derived. 

■■ ■■■:•  v.inr  specifications  are  that  we  have  a  fifth  order 
riit-  marginally  stable  inodes  (the  bending  modes) 

,  •  •  short  p'-riod  modes.  The  details  of 
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the  model  derivation  are  somewhat  involved;  they  are  given  in  the  appendix. 
The  short  period  poles  are  located  at  -2.5  ±  13. 6j ;  the  bending  mode 
poles  are  located  at  -0.5  ±  25. Oj . 

Katz  [1974],  investigated  the  sensitivity  of  an  optimal 
discrete  compensator  for  this  example,  and  offered  ad  hoc  design 
modifications  for  reducing  sensitivity.  We  intended  to  begin  with  his 
initial  design,  and  to  then  use  the  desensitizat ion  procedures  to 
systematically  reduce  the  sensitivity.  Presumably  we  could  improve  his 
results,  and  gain  insight  into  his  ad  hoc  desensitization. 

The  latter  objective--  of  gaining  insight--  proved  to  be  our  most 
valuable  contribution,  and  partially  obviated  our  other  goals.  Specifically, 
Katz's  initial  design  is  intrinsically  an  invalid  design  for  achieving 
reasonable  parameter  sensitivity. 

Katz  initially  chose  to  work  with  a  physical  model  where  the 
bending  modes  were  uncoupled  from  the  short  period  modes,  and  he  further 
chose  a  cost  function  weighting  matrix  which  ignored  these  higher 
frequency  modes.  Further,  these  bending  modes  were  only  lightly  driven 
by  the  process  noise.  As  a  result,  we  were  synthesizing  compensation 
for  a  system  which  was  unobservable  and  only  marginally  controllable 
(in  the  augmented  sense  defined  above). 

Since  we  started  with  a  highly  sensitive  system  (bending  mode  poles 
at  -0.5  ±  25 j ,  with  a  locus  quickly  crossing  the  axis),  we  would  not 
expect  good  compensation  to  result  from  ignoring  these  modes--  as  Katz 
found.  His  solution  was  to  weight  the  bending  mode  states  in  his  cost 
function  and  to  inject  process  noise  into  the  dynamics  for  these  states 
(pages  103-109).  This  apparently  ad  hoc  approach,  however,  can  be 
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clearly  understood  as  soon  as  it  is  interpreted  in  terms  of  augmented 
controllability  and  observability. 

The  added  process  noise  introduces  a  significant  covariance, 
deriving  from  the  bending  mode  states,  and  since  they  are  weighted, 
they  significantly  affect  the  performance  index.  The  resulting  design 
is  significantly  less  sensitive,  as  Katz  demonstrated  and  as  theory 
predicts . 

From  our  earlier  theoretical  work,  it  is  now  clear  why  Katz's 
initial  design  is  inadequate  for  our  program--  our  algorithms  will  not 
affect  unobservable  or  uncontrollable  modes  (exactly  those  modes  which 
in  this  case  introduce  the  sensitivity  problem).  Therefore,  we  abandoned 
our  goal  of  beginning  with  Katz's  original  design.  Instead,  we  set  a 
lesser  goal.  Beginning  with  the  final  results  for  the  FH  aircraft,  we 
attempted  to  improve  upon  them. 

The  example  we  chose  was  the  FH  aircraft  flying  at  Mach  1.2  at 
ground  level.  In  the  model  we  assumed  light  coupling  from  the  bending 
mode  back  to  the  short  period  mode,  a  heavy  injection  of  process  noise 
into  the  bending  mode  states,  and  a  light  weighting  of  the  bending  mode 
states.  Selecting  a  sampling  rate  of  10  Hertz,  we  began  from  Katz's 
least  sensitive  design. 

Figure  7.2  shows  a  root  locus  of  the  closed  loop  bending  mode  poles 
versus  the  uncertain  parameter —  the  bending  mode  frequency.  Before 
optimization,  the  stability  range  as  a  function  of  is  limited 

to  a  16%  decrease.  After  optimization,  the  permissible  change  in 
is  a  24%  decrease.  Optimization  has  increased  the  stability  range  by 


50%. 
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FH  Aircraft--  Mach  1.2  at  Altitude  =  0 
Locus  of  Bending  Mode  Poles  versus  Wg  Given 
Compensation  Designed  for  =  25  radians 


Figure  7.2  AIRCRAFT  PARAMETER  UNCERTAINTY  DESENSITIZATION 


Cost 


This  program  is  slow,  complex,  large,  and  therefore  expensive. 

Each  "run"  on  a  tenth  order  problem  (fifth  order  model  and  fifth  order 

compensator)  cost  ten  to  twenty  dollars  on  an  IBM  370.  The  program 
4 

runs  in  0(n  )  time. 

Conclusions  and  Comments 

An  algorithm  for  desensitizing  discrete  closed  loop  systems  given 
parameter  uncertainly  was  presented.  The  importance  of  augmented 
controllability  and  observability  in  this  procedure  was  noted.  The 
algorithm's  sensitivity  to  changes  in  magnitude  of  disturbance  noise  was 
noted  and  several  other  anomalies  were  explained;  in  particular,  we 
demonstrated  that  neglecting  second-order  terms  is  not  always  possible. 

The  observations  on  augmented  controllability  were  extended  to 
explain  other  ad  hoc  desensitization  procedures. 

Stability  margin  improvement  was  demonstrated  for  a  practical 
prob lem. 

The  procedure  is  comparatively  expensive. 
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Chapter  VIII 


REDUCED -ORDER  COMPENSATORS 


In  many  applications —  including  the  design  of  compensators 
intended  for  implementation  in  digital  processors--  the  complexity  of  the 
compensator  is  a  significant  issue.  Typical  state-space  design  algorithms 

yield  compensators  whose  complexity  is  comparable  to  the  complexity  of 
the  original  model  for  the  physical  system.  If  a  reduced-order  compensator 
is  desired,  the  designer  usually  begins  by  trying  to  simplify  the 
physical  model. 

We  will  consider  an  alternate  approach,  suggested  by  J.D.  Powell 
[1976],  which  does  not  require  simplifying  the  original  model.  Using 
the  program  mentioned  in  Chapter  VII,  we  will  consider  designing 
compensators  of  arbitrary  order.  The  design  criterion  will  again  be 
to  minimize  the  expected  variance  of  the  states  of  the  physical  system. 

First,  we  will  gain  insight  into  the  problem  by  working  with  a 
simple  two-body  problem.  Then  we  will  consider  the  result  of  designing 
reduced-order  compensation  for  a  seventh-order  star-tracking  telescope. 
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VI II. 2  Design  of  Reduced-Order  Compensation 

We  wish  to  consider  techniques  for  designing  reduced-order  feedback 
using  modern  control  procedures.  We  begin  with  a  simple  two-body 
problem,  and  use  frequency -domain  analysis  to  evaluate  performance. 

Three  design  techniques  will  be  compared. 

1)  Classical  Technique 

A  classical  design  of  comparable  order  provides  insight  and  can 
be  used  to  qualitatively  evaluate  the  modern  design  technique. 

2)  Engineering  Judgment  Technique 

The  system  being  modelled  is  simplified  to  the  desired  complexity 
using  engineering  judgment.  A  compensating  network  is  designed 
for  this  simplified  system,  and  then  the  compensator  is  analyzed 
for  performance  in  the  original  system.  The  technique  is 
iterated  until  satisfactory  performance  is  obtained. 

3)  Optimal  Technique 

The  compensating  network  is  designed  to  optimize  the  performance 
of  the  original  system,  within  the  constraints  imposed^by 
limiting  the  order.  As  in  Chapter  VII,  the  original  system  will 
be  augmented  with  a  compensating  network,  and  then  a  gradient 
search  procedure  will  be  used  to  choose  the  parameters  of  the 
compensator  to  optimize  performance. 

For  the  example  considered,  the  three  procedures  are  equivalent  for  low 
bandwidth  systems.  For  high  bandwidth  systems,  the  third  method  yields 
designs  which  are  stable,  with  maximal  bandwidth  and  margin.  The  second 
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method,  in  general,  yields  unstable  results,  and  the  selection  of  some 
design  parameters  (e.g.,  weighting  functions)  becomes  counter  intuitive. 

Problem: 

The  example  system  considered  was  the  fourth- order  two-body 
problem: 

| - *■  y(t) 


h** 

xl(t)  x2(t) 


which  we  write  as 

X  =  Fx  +  Gu  +  Fw 
y  =  Hx 


y  =  [0  0  1  0] x  +  v 


choosing  =  1 

K  =  5 

b  =  .01 


we  have  the  transfer  function 


,  0 . 01 ( s+2500) 

t  Sy  ry 

s“(s+.01*7.07j) 


a  system  with  two  poles  at  the  origin,  two  lightly  damped  poles  at 
7  radian/sec,  and  a  zero  near  infinity. 

Figure  8.1  presents  the  open  loop  Bode  plot  for  the  full  fourth 
order  system.  We  note  that  for  appropriately  slow  systems  we  can 
approximate  the  system  transfer  function  as: 


G'(s)  = 

s 


Our  objective  will  be  to  design  a  second  order  compensator  for 
G(s)  which  maximizes  the  system  bandwidth  while  maintaining  satis¬ 
factory  phase  and  gain  margins. 


Classical  Design--  pole  placement 

Assuming  a  second  order  system,  G'(5)  >  an  optimal  design  using 
any  of  the  algorithms  mentioned  in  Chapter  IV  will  yield  a  second-order 
compensator  with  a  single  zero: 


H(s)  = 


K(s+oQ 
OB) (s+y) 


Since  the  objective  in  executing  a  classical  design  is  to  gain  insight 
into  the  problem,  we  simplify  the  design  by  limiting  ourselves  to 
real  poles;  we  must  therefore  determine  four  real  numbers —  K,  a,  B,  Y  -- 
to  yield  H(s) . 

Considering  the  Bode  plot  for  G(s)  [Figure  8.1]  note  that  the 
phase  of  G  is  always  less  than  -180°  .  Therefore  H(s)  must  provide 


Figure  8.1  SAMPLE  PROBLEM  BODE  PLOT 


degrees 
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phase  lead.  H(s)  should  also  provide  adequate  margins  and  a  maximum 
bandwidth.  Using  the  open  and  closed- loop  Bode  plots  and  the  Nyquist 
plot  for  G(s)*H(s)  [Figure  8.2]  we  can  determine  reasonable  bounds 
on  these  margins  and  on  the  bandwidth. 

Stability 

Either  the  crossover  through  -180°  degrees  must  occur  before  "C" 
or  after  the  magnitude  peak  at  7  radians.  Since  there  is  a  phase  change 
of  -180°  at  7.07  radians,  only  the  first  alternative  is  viable;  the 
phase  must  be  less  than  -180°  at  approximately  6  radians /second. 

Phase  Margin 

The  phase  margin  will  be  the  minimum  of  the  phase  angles  at  "A" 
and  "C"  radians.  Thus,  we  want  to  maximize  the  height  of  the  phase 
peak  and  roll  off  the  phase  curve  as  quickly  as  possible  beyond  "B" 
radians/second.  The  minimum  of  "A"  and  "C"  is  approximate^  30°  . 

Bandwidth 

The  bandwidth  is  the  frequency  range  over  which  the  output  of  the 
closed  loop  system  follows  the  input  with  less  than  a  3db  magnitude 
deviation.  Looking  at  the  closed  loop  bode  plot,  this  deviation  can 
occur  at  either  "a,"  "b,"  or  "c"  radians.  If  the  magnitude  of  G*H 
is  greater  than  -7  db  at  point  "B,"  then  "c"  determines  the 
bandwidth.  If  the  magnitude  of  G*H  is  less  than  -7  db  ,  then  the 
magnitude  at  "b"  radians  will  be  less  than  -3db  .  This  gives  two 
bandwidth  ranges: 

1)  in  the  range  of  5  to  7  radians  (point  "c")  , 

2)  in  the  range  of  less  than  4  radians  (point  "b")  • 
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Gain  Margin 

The  gain  margin  is  the  reciprocal  of  the  magnitude  at  point  "B"  ; 
decreasing  the  magnitude  at  "B"  increases  the  gain  margin.  Thus, 
gain  margin  and  bandwidth  must  be  traded  against  one  another.  For  a 
bandwidth  less  than  1  radian  any  gain  margin  can  be  obtained.  For  a 
bandwidth  of  more  than  5  radians,  the  gain  margin  must  be  less  than 
2.3. 


Conclusion 

Our  objective  is  to  design  a  high-bandwidth  system.  We  therefore 
expect  to  attain  a  bandwidth  of  about  6  radians,  a  gain  margin  of  about 
2,  and  a  phase  margin  of  about  30°  . 


Examples 


Choosing  ( s) 


lOOCs*. 2) 
(s+5)2 


Phase  margin  =  18°  (determined  by  "C") 

Gain  margin  =  2.2 

Band  width  =6.3  radians  (determined  by  "c”) 

Trying  to  improve  the  phase  margin,  we  can  increase  the  phase  rolloff, 
but  we  lose  bandwidth 


H2(s) 


50(s+.l) 

(s*4)2 


Phase  margin  =  30°  (Phase  at  "A"  =  Phase  at  "C") 
Gain  margin  =3.2 

Band  width  =  3  radians  (Determined  by  "b") 


Optimal  Design--  with  System  Simplification 


If  we  ignore  the  higher  order  dynamics  of  the  original  system,  we 

2 

can  design  a  full-order  compensator  for  G' (s')  =  0.5/s  using  optimal 
methods.  We  can  then  insert  this  compensator  into  the  full  fourth-order 
system  and  examine  the  results. 

The  resulting  second-order  compensators  arc  characteristically 
similar  to  the  previous  optimal  designs;  the  closcd-lo}>  second-order 
system  and  second -order  compensator  have  a  similar  root  locus: 


When  this  compensator  is  then  introduced  into  the  full -order  system, 
we  get  a  reasonable  design  providing  the  system  bandwidth  is  less  than 
10%  of  the  resonant  frequency.  For  larger  bandwidths,  however,  the 
high  frequency  poles  move  into  the  right  half  plane,  and  the  system  is 
unstable.  (Similar  results  can  be  seen  by  evaluating  the  appropriate 
Nyquist  diagram;  the  magnitude  of  G*H  is  greater  than  1  db  when  the 
phase  passes  through  -180°  .) 

Clearly,  this  approach  fails  because  relevant  information  is  not 
retained  in  the  design  synthesis.  This  information  can  be  included  if 
we  begin  our  system  simplification  by  assuming  we  have  perfect  measure¬ 
ments,  which  in  our  example  means  we  can  measure  state  x ^  directly,  so 
a  full  state  optimal  compensator  would  have  a  third  order  numerator  and 
denominator.  Again,  it  is  unclear  in  general  how  to  then  go  beyond 
the  first-order  reduction  unless: 
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1)  Poles  and  zeroes  happen  to  cancel  in  the  compensator,  or 

2)  we  throw  information  away  (by  ignoring  higher  order  dynamics,  etc.) 

Optimal  Design--  Gradient  Search 
Given 


x  =  Fx  +  Gu  +  Tw 


y  =  Hx  +  v 


where  F  is  of  order  n  ,  we  want  to  choose  f,  g,  h,  k  and  C  to 
mir.mize  the  performance  index 


J  -  lim  --  1  -  E  f  *  (x‘Ax  +  uTBu)dt 
t,-*a  f  o  rl 


where  we  have 


u  =  -Cx 


^  f  X  gu  ♦  K(y-hS) 


The  normal  solution  to  this  problem  would  assume  f  =  F,  g  =  G,  and 
h  =  H,  then  choose  K  and  C  to  minimize  J  . 


X  =  M 

_  A 

X 


If  we  now  write  X  =  equations  (1),  (2),  (4),  and  (5)  can  be 

combined  to  give  the  full-order,  combined  state-estimator  and  compensator: 


-GC1  *i  +  fr  0]  fw 


KH  F-KH-GC 


0  K  v 


u  =  [0  -C]X. 
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and  we  can  use  expression  (3)  to  solve  iteratively  for  K  and  C  . 
Similarly,  x  is  assumed  to  be  of  order  m  less  than  n  .  This 
yields  the  augmented  system  of  order  (n+m)  : 


'f  -gc 

X,  + 

T  o' 

W 

—2 

_KH  f-gC-Kh 

0  K 

V 

Again,  the  quadratic  performance  index  (3)  and  a  gradient  search 
procedure  are  used  to  choose  optimal  K  and  C  for  the  given  X2  . 

Range  of  Results 

Note  first  that  all  four  states  of  the  original  system  can  be 
independently  weighted  (using  the  A  matrix)  and  that  noise  can  be 
injected  into  each  state.  Therefore,  there  are  many  more  independent 
design  parameters  (weighting  functions,  etc.)  than  dependent  parameters; 
in  this  example,  there  are  only  four  degrees  of  freedom  in  H(s)  ,  but 
none  of  the  entries  in  K,  C,  f,  g  or  h  have  been  specified. 

By  proper  selection  of  f,  K,  and  C,  we  can  realize  any  transfer 
function  H(s).  Specifically,  the  choice  of  f  determines  the  zeroes 
of  H(s)  ;  the  choice  of  K  and  C  determine  the  poles  of  H(s) . 
Therefore,  the  gradient  search  technique  can  explore  the  entire  range 
of  desired  transfer  functions. 


idUUBi 
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Rate  of  Convergence 

The  technique  converges  to  a  solution  in  twenty  to  thirty  itera¬ 
tions.  The  entire  run,  for  this  example,  required  approximately  four 
times  as  much  computer  time  as  one  run  of  an  Eigenvector  Decomposition 
routine  for  the  same  example. 

Convergence  is  always  faster  if  the  search  has  more  parameters  than 
degrees  of  freedom.  For  example,  convergence  is  faster  when  K,  C,  and 
f  are  allowed  to  vary.  The  same  transfer  function  for  H(s)  results. 


Examples 

The  results  shown  in  Table  8.1  represent  a  large  range  of  possible 
designs,  with  bandwidths  ranging  from  1.5  radians/sec.  to  5.9  radians/sec. 
All  systems  are  stable,  with  a  phase  margin  of  22°  and  a  gain  margin  of 
about  2.  The  resulting  compensator  is  approximate ly-- 


H(s) 


100  (s^  5) 

(s  +4  +4j ) C s  +4- 4j7 


The  root  locus  for  this  system  is  then: 


>2) 


Pole  locations 

Low  bandwidth 
Hi  bandwidth 


The  optimal  design  is  conceptually  a  refinement  of  the  classical  design, 
and  reasonably  attains  the  performance  limits  for  bandwidth  and  margin. 
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Table  8.1 

SUMMARY  OF  DESIGN  TRIALS 


Explanatory  Notes 
P  ARAMS— 


0(a,b)  -  Steady-state  Design 

a  =  (^/R^  *  noise  ratio  ,  (Q^  =  0) 

b  =  A/B  ,  weighting  ratio,  (A^  =  A2) 

P(a,b)  =  Gradient  Search  Design 

T  —  Run  Time  (fraction  of  a  minute) 

(XX)  -  nunfeer  of  degrees  of  freedom  equals  number 
of  parameters 

XX  -  excess  number  of  free  parameters 
BW  —  Band  width  (radians) 

PM  —  Phase  Margin 

GM  —  Gain  Margin 

H(s)  —  Compensator 


A  reasonable  question  to  ask  is  whether  an  appropriate  choice  of 
weighting  functions  for  the  "Simplification"  design  would  have  yielded 
the  transfer  functions  found  using  the  gradient  search.  The  answer  is 
no  (see  Table  8.2).  Depending  upon  the  compensator  being  matched,  we 
would  need  to  specify  either  a  negative  weighting  matrix,  A  ,  or  a 
negative  covariance.  Thus,  using  simplified  models  and  steady-state 
Riccati  analysis,  we  cannot  realize  all  possible  compensators.  However, 
if  we  use  this  same  model,  G'(s)  ,  as  the  basis  for  the  estimator  in 
the  gradient  search  procedure: 

• 

G'  ( s)  <=>  x  =  f  x  +  gu  +  K(y-hx) 
then  all  reduced-order  transfer  functions  are  realizable. 


Comparing  the  two  'optimal’  procedures  in  the  frequency  domain 
(Figure  8.3  through  Figure  8.5)  we  see  that  without  introducing  the  added 
information  about  the  resonant  peak  at  7  radians,  the  phase  of  the 
compensator  is  inadequately  constrained  to  guarantee  stability.  The 
"Simplification"  design  is  inadequate. 

Conclusions 

1)  Both  techniques  are  comparable  for  low  bandwidth  systems. 

2)  The  gradient  search  procedure  insures  stable  systems. 

3)  The  gradient  search  procedure  requires  more  computer  time. 

We  are  comparing  a  sixth  order  calculation  (a  fourth  order 
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Table  8.2 

SIMPLIFICATION  EQUIVALENCE 

A  compensator  equivalent  to  the  gradient  design,  P(a,b),  could 
be  realized  using  simplification  if  the  design  parameters  are  chosen 
correctly.  For  the  parameters  shown,  0(a,b)  ,  the  appropriate  design 
parameters  are  given. 


PARAMS 

Qx/R 

q2/r 

Aj/B 

V* 

! 

O 

t — t 

CO  * 
O 

r— ) 
V—' 

t  CD 

0 

io3 

10 

10 

0(1,1) 

2.5 

0.42 

3.5 

-3.6 

0(1O3, 1) 

-13.7 

140 

6 

20.1 

0(1O3,1O) 

-27.1 

216 

14 

31.2 

i 


Compensated,  dB 


Frequency,  radians 

<D 

Maximum 

-128  degrees 

© 

Minimum 

-7  db 

© 

Maximum 

36  db 

©  (103,10) 

Steady-State  Design 

Ratios:  =  I 

K 

| G j  Compensated,  db 


Frequency,  radians 

1.9 

- - \ 

Q 

Maximum  -135  degrees 

© 

0 

Minimum  -6  db 

,© 

Maximum  32  db 

P(io3, 

10)  Gradient-Search  Design 

Figure  8.4  'OPTIMAL'  DESIGN  COMPARISON,  TRIAL  II  BODE  PLOTS 


degrees 


0(1U3,1U)  Steady-State  Design 


(P  Phase  margin:  22° 
Gain  margin:  2 


P(103,10) 


Gradient-Search  Design 


Figure  8.S  'OPTIMAL'  DESIGN  COMPARISON,  TRIAL  II  NYQUIST 
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system  augmented  by  a  second  order  compensator)  to  a  fourth 

order  design  using  steady-state  Riccati  analysis.  The  computation 
ratio  was  4  to  1 . 

4)  The  Simplification"  design  cycle  must  be  iterated. 

5)  When  designing  with  "Simplification,"  the  space  of  all  desirable 
compensators  is  not  available. 

o)  riie  gradient  search  procedure  allows  the  designer  to  include  all 
known  information,  to  weight  all  known  states,  and  to  minimize 
the  overall  system  sensitivity  to  parameter  variations.  Since 
the  compensator  is  not  of  full -order,  some  of  this  information 
is  lost.  It  is  not  yet  known  how  this  design  compares  to  the 
full  state  compensation,  or  what  information  is  lost  in  the 
simplification. 

Ive  therefore  have  a  viable  algorithmic  procedure  for  designing  reduced- 
order  compensators.  This  technique  uses  the  same  independent  parameters 
as  full -state  compensator  design--  the  weighting  of  states  and  the 
minimization  of  state  covariance.  System  stability  is  guaranteed,  and 
an  optimal  design  of  any  order  can  be  realized  directly. 
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VII 1 . 4  An  Example 

Having  demonstrated  the  efficacy  of  the  gradient  search  procedure 
when  applied  to  a  hypothetical  problem,  we  shall  now  consider  an 
engineering  application. 

The  problem  is  to  design  a  minimum-order  compensator  for  a 
star-tracking  telescope  (see  Appendix  C2) .  Using  engineering  judgment, 
the  model  for  the  telescope  was  reduced  from  twelfth  order  to  fifth  order. 
With  much  effort,  a  third  order  compensator  was  then  derived.  The  loss 
of  response,  however,  was  significant. 

Figure  8.6  shows  the  results  of  optimizing  the  third  order 
compensator  using  the  gradient  procedure;  the  important  criterion  in 
this  application--  the  speed  of  response--  is  approximately  proportional 
to  the  radial  distance  from  the  origin.  Therefore,  optimization  has 
improved  the  speed  of  response  by  30%. 
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COMPENSATED  STAR-TRACKING  TELESCOPE 
Physical  System  -  5th  order 
Compensation  -  3rd  order 


A 


poles  after  optimization 


□ 


poles  unchanged 


Figure  8.6 


LOCUS  OF  COMPENSATION  OPTIMIZATION 


Chapter  IX 

IMPLEMENTATION  ISSUES 
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Implementation  of  the  algorithms  presented  in  earlier  chapters  is 
generally  straightforward,  but  some  interesting  computational  issues  do 
arise.  These  Issues  will  be  the  focus  of  this  chapter;  specific 
details,  such  as  program  descriptions  and  source  code,  will  follow  in 
a  separate  technical  report. 

The  algorithms  described  in  chapters  III  and  IV  were  implemented 
in  a  hybrid  mixture  of  Fortran  and  C.  Since  many  numerical  analysis 

programs  --  the  QR  algorithm.  Singular  Value  Decomposition,  etc.  -- 
already  exist  in  Fortran,  we  chose  to  use  these  programs  as  written. 

New  programs,  were  written  in  C  ,  a  language  often  better 
suited  to  this  type  of  procedural  programming.  (In  addition,  our 
system  px-ovides  much  better  support  for  developing  C  programs.) 

In  retrospect,  the  concommi tant  advantages  intrinsic  in  using 
two  languages  may  not  have  outweighed  the  disadvantages  of  increased 
complexity  and  of  pioneering  the  interface  between  the  resulting 
bifurcated  program.  Two  Fortran  compilers  and  two  language  interfaces 
later  (and  after  a  tremendous  amount  of  work)  the  language  processors 
were  ultimately  debugged,  and  a  powerful  signal-processing  library 
evolved.  The  diversion  of  effort  required,  however,  was  considerable. 

The  decision  to  develop  a  hybrid  implementation  was  made  too 
lightly.  In  research,  as  in  industry,  decisions  which  involve  signifi¬ 
cant  commitments  of  programming  effort  must  be  made  after  carefully 
considering  the  consequences.  The  requisite  pioneering  required  was 
uniformly  underestimated. 

*C  is  a  language  similar  to  Algol  supplied  by  Bell  Laboratories  with 
their  Unix  operating  system. 


— . rnmmmmmm*  mmmmmmnmr m  ***** 
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I X . 2  Algorithm  Implement atlon 

By  exploiting  the  structure  of  the  arrays,  many  speed  advantages 

can  be  accrued.  For  example,  to  multiply  two  general  matrices  requires 

n3  multiplications  (using  traditional  sequential  algorithms).  To 

multiply  two  lower  triangular  requires  n* (n+1)  (n+2)/6  multiplies; 

for  a  sixteenth  order  system,  this  is  an  80%  reduction. 

In  square  root  algorithms,  we  can  exploit  this  characteristic 

quite  often.  If  the  square  root  of  a  quantity  C  is  L  ,  where 
T 

C  =  LL  _  and  we  normally  compute  with  L  ,  then  constrain  L  to  be 

T 

lower  left  triangular.  If  we  normally  compute  with  L  ,  then  constrain 

T  T 

L  to  be  lower  left  triangular.  In  both  cases,  C  =  LL 

Gaussian  Elimination  can  also  be  used  to  simplify  multiplication. 

If  we  have  four  block  matrices--  A,  B,  C,  D--  and  need  to  calculate 

D  -  CA  *B  ,  this  is  equivalent  to  arranging  the  blocks  in  an  array 

'A  B' 

C  D_ 

and  performing  Gaussian  Elimination  on B  and  D  to  bring  this  array  to 
the  form 


'A  0 
C  E 


E  then  equals  D 


CA~ 1 B  . 


■a*"- 
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IX. 3  Parallel  Implement at ion 

Several  papers  have  been  written  on  using  special  hardware 
configurations  to  assist  in  computing  these  algorithms.  Sameh  and 
Kuck  [Sameh,  1977]  present  a  QR  algorithm  requiring  0(n)  processors 
which  results  in  a  speedup  of  0(n/logon)  . 

Morf,  Dobbins,  Friedlander,  and  Kailath  [Morf,  1978]  made  a  similar 
observation  in  noting  the  applicability  of  parallel  processing  to  square 
root  algorithms.  Recalling  Section  II 1.7,  they  proposed  partitioning  an 
interval  of  data  into  subintervals  within  which  state  estimates  are  cal¬ 
culated  based  only  on  data  within  the  subinterval;  a  separate  processor 
then  calculates  the  estimate  for  each  sub interval.  Only  then  do  the 
processors  need  to  communicate,  to  "connect"  the  subinterva!  "ndpoints. 

These  algorithms  look  extremely  promising,  but  other  alternatives 
are  also  attractive.  For  example,  in  the  doubling  algorithms,  we 
could  perform  Householder  reflections  in  parallel,  with  each  processor 
operating  on  a  different  column  or  a  different  row.  As  we  found  in 
Chapter  VI  after  looking  at  results,  it  is  not  necessarily  obvious 
which  solution  is  preferable.  Memory  dynamics  and  intra-processor 
interactions  can  profoundly  effect  the  efficacy  of  an  algorithm. 

The  interaction  between  algorithms  ..nd  their  hosts,  and  the  ensuing 
synergy,  is  a  largely  unexplored  area,  especially  for  parallel  processing. 
More  research  is  needed  to  identify  the  underlying  structure  of  linear 
system  theory,  to  understand  the  communication  of  information  required 
during  t^e  processing,  and  to  bound  the  error  propagation  as  parallelism 
is  exploited. 
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I X . 4  Special  Hardware 

The  parameter  optimization  program  of  Chapter  VII,  in  solving  a 
fifth  order  problem,  cost  twenty  dollars  to  run  on  the  IBM  370.  The 

4 

algorithm  is  order  n  .  For  a  sixteenth  order  problem,  the  cost  soars 
to  $2,000  per  design  cycle.  The  question  is  whether  to  focus  on  improving 
the  algorithm  or  minimizing  the  hardware  costs. 

The  IBM  370  rents  for  approximately  $1300  per  hour.  Recently,  very 
fast,  sequential  processors  have  been  introduced  which  are  approximately 
one  hundred  times  faster,  and  rent  for  $75  per  hour.  If  one  could  be 
adapted  to  the  task,  the  16th  order  system  could  be  solved  for  $1.15 
instead  of  $2,000.00. 

The  transition,  and  consequent  savings,  is  not  straight  forward 
because  the  fast  sequential  processor  (henceforth  referred  to  as  an  Array 
Processor,  or  AP)  could  not  be  programmed  in  FORTRAN.  Programming  them 
in  their  language  is  so  primitive,  by  current  standards,  that  it  can  only 

be  justified  for  processing  very  large  volume  problems.  (For  example,  a 
Fast  Fourier  Transform  program  was  estimated  to  take  200  hours  of  program¬ 
ming  effort.)  Industry  experience  indicates  we  could  program  and  debug  one 
machine  language  instruction  per  hour;  the  parameter  optimization 
algorithm  is  over  2,000  FORTRAN  statements  in  length. 

As  an  alternative,  the  AP  manufacturer  supplied  FORTRAN  callable 
subroutines  for  nearly  all  vector  operations,  and  for  such  standardized 
computations  as  matrix  inversion  and  Fourier  transforms.  Performance 
data  for  our  system  using  this  software  appear  in  Figure  9.1;  execution 
times  are  shown  for  vector  multiplication  (VMIJL) ,  vector  initialization 
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(VC LI)  ,  Fast  Torn  i<.  r  transforms  (FFT) ,  and  data  transfer  (DATA)  as  a 
function  of  vector  length  A.  .Vote  that  the  crossover  between  host 
and  AP  vector  multiply  occurs  near  X  =  60  .  Our  AP  can  hold  a  90x90 
matrix.  Thus,  manipi  -ting  columns  of  a  matrix  in  the  AP  is  only 
marginally  faster  than  in  our  host  (and  additional  AP  memory  would 
provide  little  improvement). 

Conventional  wisdom  has  held  that  array  processors  tend  to  be 
input-output  limited;  the  array  processor,  as  a  peripheral  device 
attached  to  the  host,  must  transfer  all  of  its  data  to  and  from  the 
host  memory.  For  most  matrix  problems  and  many  vector  problems,  however, 
the  real  limitation  lies  in  the  time  required  for  the  host  to  initiate 
the  next  AP  program.  This  limit  makes  few  tasks  besides  Fourier 
transforms  look  attractive.  As  a  consequence,  despite  diligent  efforts 
at  applying  this  system  to  projects  in  our  laboratory,  we  have  yet  to 
exceed  ninety  seconds  of  AP  useage  per  day  (as  cf  March,  1978). 

These  results  are  consistent  with  measurements  made  at  other 
installations.  Therefore,  if  the  research  potential  of  array  processors 
is  to  be  realized  either  the  host  computer  must  be  dedicated  to  serving 
the  AP,  or  the  AP's  software  and  interface  hardware  must  be  improved 
beyond  what  is  currently  available. 

The  array  processor,  although  capable  of  resolving  the  speed-cost 
problem,  could  not  be  used  because  of  these  weaknesses. 
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Chapter  X 
CONCLUSIONS 

The  important  conclusion  from  the  first  chapters  is  that  square- 
root  doubling  is  an  attractive  alternative  for  solving  the  discrete¬ 
time  Riccati  equations  arising  in  estimation  and  control.  Numerically, 
the  algorithm  compares  favorably  with  the  alternatives:  the  speed  is 
comparable  up  to  tenth  order,  the  memory  requirements  are  comparable,  and 
the  accuracy  is  comparable.  The  primary  advantage  of  square  root  doubling 
is  the  broad  class  of  problems  it  will  solve.  This  expanded  class 
includes  systems  with  singular  dynamics  matrices  [not  handled  by 
eigenvector  decomposition),  and  the  SQD  algorithm  handles  problems  with 
repeated  closed-loop  eigenvalues  sans  difficulty.  As  noted,  eigenvector 
decomposition  can  presumably  be  extended  to  also  handle  these  cases. 

From  a  theoretical  perspective,  a  direct,  algebraic  derivation  of 
the  square  root  doubling  algorithm  was  presented.  The  role  of  orthogonal 
transformations  was  elucidated;  this  included  rank  compression  and  the 
implicit  calculation  of  matrix  inverses.  In  the  process,  several 
questions  concerning  the  structure  of  the  algorithm  were  raised. 

A  pure  scattering  theory  derivation  was  given.  Defining  the  square- 
root  of  a  scattering  matrix--  actually  one  "rail"  of  a  symmetric  network-- 
these  questions  were  readily  solved.  The  J-orthogonal  transformations 
appeared  both  to  unwind  loops  and  to  "compress"  data  paths.  The  [  matrix 
update  arising  in  the  scattering  framework  was  now  clearly  different  from 
the  error  covariances  being  updated  with  the  square  root  algorithm.  Trans¬ 
mission  path  variables  were  not  amenable  to  square  roots,  whereas  vari¬ 
ables  cutting  the  line  of  symmetry  were. 


i  niMTrriT  jHMltfr 


jgiiijgi 


m 


200 

In  the  continuous  case,  eigenvector  decomposition  still  appears 
to  be  the  preferred  choice  for  solving  the  Riccati  equation.  This 
result  follows  because  other  alternatives  either  require 
choosing  a  discretionary  constant  or  inordinately  constrain  the  class 
of  solvable  problems.  This  conclusion  may  change  either  from  the 
introduction  of  new  algorithms  or  by  specifying  a  procedure  for 
selecting  the  discretionary  constant. 

In  examining  systems  with  singular  transition  matrices,  a  published 
example  was  considered,  and  shown  to  be  a  subopt imal  solution  which 
performed  well,  within  the  problem's  context. 

A  discrete-time  parameter-uncertainty  algorithm  was  evolved  from 
previous  continuous  algorithms.  Limitations  on  the  algorithm  were 
found: 

1)  uncertain  modes  had  to  be  observable  through  the  cost 
function,  and 

2)  uncertain  modes  had  to  be  excited  by  external  disturbances. 

This  latter  constraint  arises,  at  least  in  part,  from  simplifying  assump¬ 
tions  made  during  the  derivation;  these  assumptions  led  to  the  introduction 
of  an  unphysical  scale  factor  introduced  in  an  earlier  study  [Hadass,  1974] 

In  applications,  this  approach  elucidated  the  ad  hoc  desensitization 

of  an  earlier  work  [Katz,  1974],  When  applied  to  a  realistic  example, 

it  improved  the  sensitivity  range  by  50%.  The  procedure  was  straight- 

4 

forward,  but  required  0(n  )  computer  time. 

This  algorithm  was  applied  to  the  design  of  arbitrary-order  compensa¬ 
tion  using  full-system  information.  Excellent  results  were  obtained, 
both  for  created  problems  in  the  frequency  domain  and  for  an  actual 
telescope  example.  Again,  the  computation  required  was  high,  but  the 


total  man  hours  required  was  significantly  reduced  compared  to  a 
classical  design. 


The  adaptability  of  system  theory  algorithms  to  specialized  hard¬ 
ware  was  examined  and  found  very  promising.  The  adaptability  of  present 
hardware  to  present  algorithms  was  found  lacking.  Software  research, 
including  new  languages  and  fundamental  numerical  algorithms,  is 
required.  The  area  promising  the  most  return  for  invested  research, 
however,  is  probably  in  hardware,  including  special  processors,  innova¬ 
tive  memory  systems,  and  external  interfaces. 
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APPENDIX  A1 

SOFTWARE  FOR  SOLVING  THE  STEADY-STATE  RICCATI  EQUATION 

Implemented  in  1978  on  a  DEC  PDP11  j 

Part  of  this  software  implements  many  of  the  algorithm;  developed  in 
Chapters  III  and  IV.  Other  programs  provide  the  support  required  to 
properly  interpret  the  computations.  These  routines  cannot  be  considered 
design  tools;  they  were  written  with  research  as  the  prime  objective. 

We  feel,  however,  their  design  philosophy  will  lead  to  effective  design 
tools  as  our  repertoire  of  software  expands. 

We  chose  to  implement  each  separable  portion  of  each  algorithm  as 
an  individual  program,  thereby  minimicing  memory  requirements  and 
cleanly  modularizing  the  software.  For  example,  there  are  separate 
programs  for  calculating  the  requisite  controller  gains,  the  desired 
filter  gains,  the  loop  eigenvalue's,  for  generating  data,  and  for 
plotting  the  results. 

Our  operating  system,  UNIX,  provides  facilities  for  easily  con¬ 
necting  the  output  of  one  program  to  the  input  of  another.  For  example, 
to  generate  the  Kalman  gains  for  a  given  model  (  stored  in  file  "model") -- 
where  we  wish  to  vary  the  noise  parameter,  q--  we  would  feed  the  input 
data  into  a  generator  program  ("gen"),  which  feeds  into  the  square  root 
doubling  algorithm  ("dbl.  dk") ,  then  into  the  eigenvalue  routine  ("eigen"), 
and  then  into  an  editing  routine  which  extracts  the  Kalman  gains  ("strip"). 

On  our  system,  this  is  entered  as  the  command : 

gen  model  iq  q| dbl . blk | eigen | strip  K 

Ten  seconds  later  the  results  appear  at  the  terminal--  and  only  the 
results  requested  (the  gains  K) . 

-  - -  - — — — — — — ^ ^ ^ iwj 


The  currently  available  software  is  listed  in  Table  A. 1.1. 


Program  Modules 

Mainl ine 

The  mainline  code  is  written  in  FORTRAN,  and  does  little  more  than 
call  subroutines. 

» 

Numerical  Analysis 

The  numerical  analysis  routines  closely  follow  the  IMSL*  package, 
and  are  written  in  FORTRAN.  The  routines  used  include  the  QR  algorithm. 
Householder  transforms,  Cholesky  decomposition  (for  taking  square  roots 
of  positive  definite  matrices),  and  a  singular-value-decomposition 
routine,  also  for  taking  matrix  square  roots. 

Input -output 

A  centralized  module  provide'  input  and  output  capabilities. 

Matrix  support 

A  final  module  provides  all  requisite  matrix  operations--  such  as 
matrix  multiply  and  Gaussian  elimination--  on  block  matrices.  Matrices 
are  defined  at  compile  time  to  have  a  prescribed  structure,  e.g., 
to  be  of  the  form: 


P  m  n 


1 

3 

5 

2 

4 

6 

matrix  3 


IMSL  -  International  Mathematics  and  Statistical  Libraries. 


i 

[ 

t 


204 


T-vtufmttf <tirm 


The  programmer  could  then  multiply  block  2  of  matrix  3  by  block  5, 
leaving  the  result  in  block  6  [an  (n  by  p)  times  (p  by  n)  calculation] 
by  writing 


call  mul(3,2,  3,5,  3,6) 

Arguments  come  in  pairs;  the  first  element  specifies  the  matrix,  the 
second  specifies  the  block.  To  facilitate  the  programming  process, 
this  code  was  originally  written  in  C  (an  ALGOL-like  language) ,  and 
was  later  translated  into  FORTRAN. 

Input  Format 

The  input  format  is  nearly  completely  unrestricted.  The  user 
merely  specifies  an  array  name,  followed  by  data  values.  The  input 
routine  checks  for  consistent  dimensions,  and  for  a  complete  input 
set . 

A  nearly  self-explanatory  input  file  appears  in  Figure  A. 1.1. 
Note  that  comments  are  any  string  of  characters  appearing  between 
'/*'  and  comments  are  completely  ignored  by  all  programs. 


•» 


1 


[mp  lement  ed  Sof tw s.  re 
Table  A.  1 . 1 


\ 

f 


NAME 


gen 


eigen 


strip 


FUNCTION 

Take  an  existing  model  and  modify  it  repeatedly, 
according  to  specifications  supplied  from  a  table 

Find  all  appropriate  eigenvalues  (given  tp,  K, 

and  II);  eigen  finds  both  the  open  and  close.!  loop  values 

Delete  unwanted  output 


dbl . dk 
sqr.dk 
ric . dk 
dbl .blk 
dbl .dc 
disc 
optsys 


square  root  doubling,  discrete,  Kalman  gains 
square  root,  discrete,  Kalman  gains 
Riccati  iteration,  discrete,  Kalman  gains 
bilinear  square  root  doubling,  Kalman  gains 
square  root  doubling,  discrete,  controller  gains 
Eigenvector  decomposition,  discrete 
Eigenvector  decomposition,  continuous 


All  additional  software  descriptions,  listings,  etc.,  will  appear  in 
a  separate  technical  report. 


1 


< 

■j 

i 

\ 

'•i 

i 
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/* 


SOLUTION  OF  THE  STEADY  STATE  RICATTI  EQUATION 
Tue  May  23  11:00:45  1978 
Square-Root  Doubling 
Revision  I 
star.  1 


Number  of 

states —  4 

disturbances —  1 

measurements —  1 

*/  © 

/* 

*  Test  of  star  tracking  telescope  model 

* 

*  Two-fold  problem: 

*  1)  Has  repeated  roots,  marginally  stable 

*  2)  Incorporates  two  delays. 


* 

*/ 


print_flag  =  50 
iteration__count  =  30 
timing_cycles  =  10 


phi 

1.0 

1.0 

0.0 

0.0 

1.0 

0.0 

1.0 

0.0 

0.0 

0.0 

0.0 

1.0 

gamma 

0.5 

1.0 

0.0 

0.0 

q 

100. 

0 

h 

0.0 

0.0 

0.0 

r 

1.0 

Note: 

0 

First 

comment  s 

upplied 

Figure 

A.  1.  1 

0.0 

0.0 

0.0 

0.0 


1.0 


by  pre-processor 
SAMPLE  PROGRAM  INPUT  FILE 
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APPENDIX  B1 

DESIGN  OF  A  DISCRETE  ESTIMATOR  FOR  THE 
SPACE  LAB  INFRARED  TELESCOPE 


From  the  paper  by  J.  David  Powell  and  Eric  Parsons,  [Powell,  1978]  we 
have  the  following  model  for  the  gyro  star  tracker  (see  also  Appendix 


d>  x 

n 


1 


+  r 


n 


n-l 


(1) 


v 

'  n 


[1 


0]Vi  + 


n-l 


+  w 


V 

n 


(2) 


We  now  follow  their  development,  elaborating  where  necessary  to  clarify 
issues . 

We  assume  the  following  disturbance  characteristics: 


>c-  (n-nT) 
i  J 

=  N 

6.  . 

e-  (ni) 

=  0 

(3) 

T 

&  (v.v!) 
i  j' 

=  R 

6.  . 
ij 

f(V 

=  0 

(4) 

T 

f.  (w .  w  . ) 

l  j' 

=  Q 

6.  . 
ij 

f  (w.) 

=  0 

(5) 

=  0 

(6) 

f-Cn^T) 

=  0 

(7) 

f- (  v.wT) 

=  0 

(8) 

Replacement  by  Equivalent  System  with  no  Delays 
Applying  (1)  to  (2)  recursively,  we  get 
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Assuming  <p  is  invertible,  this  gives 


n-2 


r2xn  -  <i»_2rnn_1  -  <D_1rn, 


n-2 


y  =  -  4>_2rn  i  -  <J)_1rn  n)  -  v  +  w 

7n  n  n-1  n-2  n  n 


Defining  H  =  H<fi 


-2 


-1, 


v  =  w  -  v  -HFri  ,  -  H<()  Tn  _ 
n  n  n-1  n-2 


y  =  Hx  +  v  (9) 

J  n  n  n 

However,  it  is  no  longer  true  that 

f  (fljVj)  =  0 

f  (v.v.)  =  C  6. .  ,  for  some  constant  C. 
i  J  iJ 

He  begin  uncorrelating  the  noises  by  modifying  the  state  equation. 
Since 


yn  '  fWxn-l  -  firnn-l  '  '  0 

we  can  introduce  a  weighting  matrix,  L  ,  such  that 

x  =  (I-LH)<j>x  +  Ly  +  (I-LH)rn  ,  -  L  v  (10) 

n  n-1  ^n  n-1  n 

Since  Ly^  is  a  deterministic  input,  we  are  free  to  choose  L  so  that 
the  new  process  noise  is  "uncorrelated"  with  the  measurement  noise  v  , 


thus : 
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6{(I-LH)rV) 


&  { (i-LH)Fn  ,-l(w  -v  -firn  1-H<i>'1rn  _Kw  -v  -n  1rTfiT-n  0rVTHT)} 

3  n-1  n  n  n-1  n-2  n  n  n-1  n-2  3 


=  -(I-LH)rNrTHT  -  LQ  -  LR  -  LHrNrTiiT 

-1  T  -T  T 

-  LH<f>  xrNr  <p  h 

T-T  -]  T  -T  -T1 

=  -lW  H  -  L(Q+R+H<J)  rNT  <j>  H  3 


or,  defining  R  such  that 


r  =  <5.  CvivT)  =  q  +  r  +  firNrTHT  +  n<|)'1rNrT<j)"1iir 


=  q  +  r  +  fl(rNrT+(j)rNrT(j)TjnT 


Substituting  (12)  into  (11),  and  setting  (11)  to  zero  yields 


-  (I-LH)rNrTfir  -  LR 

T-T  -  T-T  -  -1 

-  TNr  h  (urNr  h  -r) 

T-T  -1  T-T  -T  -1 

-  TNT  11  (Q+R+H0  TNT  <f>  II  ) 


Next,  we  examine  the  resulting  measurement  and  process  noise  correlation 
functions : 


v  -v  -firn  ,  -H4>~ 1  Tn  ->)(w  ,-v  .  -n  JTHT-n  ,rT4>_THT) 

n  n  n-1  n-2  n-1  n-1  n-2  n-3 


f.(vivj)' 


H4>_1rNrTHT  =  R  (from  (V}) 

R0  ,  if  i=j 
R,  if  |  i-  i  I  =  1 
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-T 


&(VlV2}  =  ‘  ^VV^Vr^'^V^)**  ) 


=  LH<j)-1rNrT  =  n 


1 


=  \  N, 


if  i=j 
if  | i-j | 
else 


g  (v  nT  J 
n  'n-lJ 


=  0 


by  construction 


S  (v  nT) 
n  'n 


H4>'lrnn-1))T} 


T  -T  T  T  — 

-HrNr  <p  ‘h  l  =  c 


01 


s  (v  nT 
'  n  'n-2' 


-Htjj^rNr1  =  c 


10 


f'(v-nj  =  <c 


i  j 


"01 

10 


o 


if  i-j  =  0 
if  i-j  =  2 
else 


Numerical  Results 


0'1  = 

'1 

-T' 

<P~2  = 

'l 

-2T" 

0 

1 

0 

1 

H  =  H<}>~2  =  [1  -2T] 


R0  =  Q  +  R  +  J  t4N 
Rl  =  J  t4n 


L  = 


7n 


r^4  -i 


2  r 


4 

T  N,  -1 


(Q+R+  2_11) 


Nq  =  (I-LH) 


r” 1 1  r  4 


3  ,_-i 


T  / *  T  /2 

3  2 

T  /2  T 


*1  -  L[  -  1 

F  -  3  4  .T  .. 

C01  4  T  L  N 


—  T  ^ 

c10  ■  ‘J  N  y  W 


J3]n 
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4 

Assuming  »  (Q+R) 


L  = 


3 

6/T 


—  5  4 

R0  -  2  T  N 


I  =  3  t4n 

4 


R.  = 


To  Ro 


o 

II 

f5  4 

3l 

_ 

[3  _4 

3  ~3"| 

2  T  5T 

N  ^  = 

4  T 

Tt 

ST3 

10T2 

It3 

2 

3T2 

N  ,  N  = 


—  N 
10  0 


C01  •  "I  1T<  2t3)n 


C-10  ■  T  I1*  2t3]» 


Note  that 

l|S(VivJ+i)(| 'v  0.3  x||g(v.v]')  |!  (14) 

ilg^i^i+l)H^  0,3  II 

lls(vinT)lhl|  s(n.^)|| 

for  many  cases . 

In  conclusion,  for  this  example,  when  process  noise  dominates 
measurement  noise,  the  assumption  that  these  disturbances  are  white 
and  uncorrelated  breaks  down  significantly.  We  saw  in  Chapter  V  that 
this  leads  to  suboptimal  results. 


212 


Appendix  C 

EXAMPLE  PROBLEM  FORMULATIONS 

The  following  appendices  detail  the  problem  formulations 
used  as  examples  in  the  body  of  the  text.  They  are  included  to 
facilitate  replication  of  the  results  presented  earlier. 
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APPENDIX  Cl 

DERIVATION  OF  F-H  AIRCRAFT  EQUATIONS 
A  Synthesis  of  Paul  Katz's  Example  Problem  [Katz,  1974] 

Admonitions: 

The  references  used  to  synthesize  the  FIT  example  have  several 
significant  typographic  errors;  in  some  figures  terms  are  unquestionably 
missing,  the  derivative  terms  are  of  the  wrong  order,  or  the  subscripts 
are  incorrect.  There  is  insufficient  background  information  presented 
in  the  cited  references  to  reconstruct  these  equations;  corrections  were 
normally  made  by  appealing  to  consistency  in  those  cases  where  the 
correction  was  unclear. 

Per ivati on : 

If  the  objective  is  to  accurately  model  the  FH  example  aircraft, 
then  the  best  set  of  equations  and  coefficients  appear  on  pages  256-259 
of  reference  6.  These  equations  incorporate  three  bending  modes,  a  set 
of  flight  conditions  which  don't  match  those  of  Katz,  and  no  wind  gust 
model.  The  wind  gust  can  easily  be  introduced  by  using  Katz's  first 
order  Gaussian  model,  and  the  substitution 


I 


The  sequel  assumes  the  objective  is  to  match  Katz's  results,  using 
his  proffered  equations,  reference  1,  pages  45-50.  This  is  difficult, 
since  the  equations  and  coefficients  are  inconsistence  even  within  the 
section  just  cited;  these  equations  were  actually  drawn  from  several 
diverse  sections  of  references  2  through  6,  they  were  not  copied 
accurately,  and  the  equations  which  were  chosen  as  an  original  basis  had 


ambiguities  which  were  not  resolved  satisfactorily.  These  problems 
will  be  pointed  out  in  the  sequel,  but  no  attempt  will  be  made  to 
analyze  the  impact. 


Iv'e  begin  by  reproducing  Katz's  initial  equations  (pp.  45-50,  Ref.  1) 


Ul  q  -  (Mq)q  *  pycj  .  (jyo,,  .  (p)6e  *  (k„  p  5-7)  *3 


z  ,  \  z,  1  /r  1 

“T  =  “  *  [“op  *(“oT»j  1  *[“o  ;  6  ‘  (“oj  8 


[3]  w  =  -  —  w  +  r  n 

g  ^  \j  g  g 


[4]  x3  =  “b  X4  +  r4  ng 


[5]  x4  =  (-<ub)x3  -  (2/b  up)  x4  +  (up  k2  Z6  )6e  +  (k2  up  ZQ)i 


output 


L6]  +  kp  X4  +  Vq 


[7]  nT  =  U0(q-a)  +  (^)q  ♦  *4  *  v„ 


where 


fl  =  (Y  -  -  W 

T  a  U0  g 


q  -  pitch  rate,  sec 
a  -  angle  of  attack,  rad 


J  2t 
v  w 


T  U 

w  w 


w  -  gust  velocity,  ft/sec 

x  -  bending  mode  velocity  (  rad/sec 
4 
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From  notations  in  reference  2,  Katz  apparently  began  with  the  FH 
simulation  appearing  in  Appendix  M,  pgs.  .382-583.  This  is  a  7th  order 
simulation,  incorporating  two  bending  modes,  presented  with  no  explanation. 
The  higher  order  bending  mode  is  simply  dropped  in  the  derivation  of  the 
fifth  order  model. 

We  have,  removing  obvious  typographic  errors-- 


dropping  the  2nd  bending  mode  and  spoiler  input,  and  making  the  following 
change  of  notation, 
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x  , 

-►  X  - 

a 

->■ 

a 

7 

4 

3 

w 

“la 

x 

-*•  X  . 

a. 

-V 

a 

to. 

4 

i 

1 

T 

-*  T 

2 , 

Z, 

w 

W 

lg 

1 

The  remaining  question  involves  determining  whether  a  ,  in  equations 
£ 8—9 J ,  should  be  or  .  W'e  shall  investigate  this  point  in  a 

moment . 


LIU  q 

a„ 


x4 

[12]  qT 


%(-x3-2/lx4  +  4Z6  5e  +  0.6Zaa) 
e 

2!  x4 

14  r  i 

Zj  e^  +  q  =  q  +  — - —  +  K  (we  assume  K=0) 


UQ(q-a)  +  (Aa)q  +  (~ )x4 


We  begin  by  noting  an  apparent  discrepancy  between  equations  [1]  and 
[11].  Working  now  from  the  equations  given  on  page  257  of  Ref.  6  and 
pages  18-22  of  Ref.  4,  we  get 


rm*  ,  “a  l->  1  c  e^  f 

“T  =  q  *  <0o'  *  vtf'e '  %  4  V"T  *  V  e 


[14]  q  =  (^)q  +  (Ma)aT  +  (Ma)\  +  (M5  )  5e  + 


M*  M 

a  •  a 

H—  W  +  rr--  w 
U0  «  U0  5 
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but  »  —  -- -  ,  therefore  equation  [13]  f  [2] 

0  w  0 

U4]  %  11] 

This  derivation  questions  the  veridicality  of  the  equations  Katz  worked 
with.  Equation  [1]  does  not  match  either  equation  (11)  or  (14). 

The  additional  bending  mode  coupling  term,  found  in  equation  [l],  could 
not  be  found  explicitly  in  any  of  the  references.  Making  a  very  rough 
estimate  of  the  parameter 

Y  =  kb  M<$ 

e 

using  the  data  presented  on  page  258  of  Ref.  6,  we  would  expect  y  to 
be  in  the  range 

10'2  <  y  <  101 

Katz  never  explicitly  cites  the  value  of  y  ,  but  to  match  his  results, 

_  1 

he  apparently  used  the  value  of  10  “  . 

Coefficients:  the  coefficients  Katz  presents  on  page  48,  Ref.  1,  cannot 
be  directly  applied  to  equations  [1-7].  The  translation,  however,  is 
straightforward;we  again  have  the  choice  of  using  Katz's  data,  or  the 
data  appearing  in  the  references--  which  differ  occasionally.  (See 


Table  C.l. 1.) 
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Table  C.1.1  (con't) 


© 


Katz  used  14  feet,  which  is  the  value  given  for  ©• 
Calculated,  however,  the  value  would  be 


£ 

eg 


-  i/12  = 


320.4  -  77.0 

12 


20.3  feet 


( 2 )  The  simulation  gives  a  value  of  0.6  for  k„  .  Katz 

used  0.06. 


© 


Z  =  Z  •  UA  M  =  M  •  U. 
a  to  0  a  to  0 


M*  =  M •  •  U„ 

a  to  0 


©  By  empirical  induction  on  Katz's  data,  Z^  =  -7.23  x  10 


-2 
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Design  parameters: 

Weighting  matrices-- 


Katz  originally  used 
{Ref.  1,  page  78} 


A  = 


0.12 


II 


B  =  1 


This  choice  yields  a  nor.-observable  cost  function;  to  produce 
adequate  sensitivity  margins  he  later  adopted  [Ref.  1,  p.  105] 


\ 


0.12 


11 


0 


0 


B  =  1 


Noise  statistics-- 

Katz  originally  assumed  a  zero  mean  normal  distribution  for  the 
wind  gust  noise  (unit  variance).  This  choice  yields  an  oblivious 
filter,  and  forced  the  introduction  of  T ^  ,  whose  value  appears 
on  page  109  ,  Ref.  1. 

Measurement  noise  statistics  are  given  on  page  85  .  Ref.  1. 

Results : 

Pole  locations  are  given  on  pages  50,  58,  78,  96,  105,  109,  132, 


and  144  „  Ref.  1. 

In  interpreting  these  results,  carefully  determine 
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1)  kb 

2)  T  ,  the  sample  rate 

3)  A 

4)  r4 

With  the  proper  choice  of  these  parameters,  equations  [1-7] 

and  Katz's  coefficients  yield  his  results  to  within  a  few  percent 

Addendum : 

Reference  7  became  available  after  this  writeup  was  completed. 

Longitudinal  equations  for  the  FH  example  occur  in  two  places,  pages 

2-4  to  2-5,  and  page  2-61.  Two  points  need  to  be  made: 

1)  Katz  assumed  Z  =  U  Z  ,  etc.  I  have  not  studied  this 
a  o  w 

derivation,  but  it  should  be  noted  that  the  authors  of  the 
basis  equations--  Sutton,  et  al.--  apparently  used  the 
transformation 


2)  The  conjectures  concerning  the  inconsistency  between  equation 
[1],  111],  and  l 14]  is  reinforced  if  we  look  at  equation  (1) 
on  page  2-61,  where 


q  =  M  q  +  M*  a  +  M(a-a  )  +  M,  5 
M  n  1  a  g  6  e 


a  =  (1+Z  )q  +  Z  (a-a  )  +  Mx  6 

q  M  or  g  6g  e 


where  a  is  the  angle  of  attack  due  to  still  air.  Katz  used 
g  6 


instead, 


q  =  M  q  +  M* (a-a  )  +  M(a-a  )  +  M.  6  ,  etc. 

1  q  av  g  g  6e  e 

but  a  ^  0 
g 
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STAR  TRACKING  TELESCOPE 
(Original  Source:  reference  [Powell,  1978]) 

The  Spacelab  Infrared  Telescope  facility  [Ref.  1]  is  designed 
to  allow  astronomers  to  see  more  than  a  factor  of  ten  deeper  into  the 
universe  than  is  currently  possible  with  ground  based  telescopes.  The 
video- inertial  pointing  technology  for  this  system  has  been  under 
development  at  Ames  Research  Center  for  several  years  [2,3,4].  The 
design  combines  gyro  and  video  information  to  arrive  at  an  attitude 
estimate  which  possesses  the  best  characteristics  of  each  sensor  [5], 

The  example  in  this  appendix  focuses  on  the  estimation  of  the 
gyro  drift  using  delayed  measurements  from  a  video  star-tracker.  We 
will  assume  the  video  sensor  and  the  processor  implementing  the  filter 
are  synchronous  in  operation,  and  have  comparable  delays.  The  analysis 
is  for  steady-state,  single-axis  fine  pointing  of  the  telescope;  slew 
and  other  induced  disturbances  are  not  considered. 

Although  the  gyro  noise  is  not  actually  white  in  practice,  after 
sampling  the  disturbance  noise  is  apparently  uncorrelated  [5]. 

The  filtering  problem  and  gyro  noise  model  appear  in  Figure  C2.1, 
and  can  be  modeled  as  a  two  state  system: 
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Figure  C.2.1  OPTIMAL  DRIFT  ESTIMATION  AND  GYRO  NOISE  MODEL 
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where 

T  =  the  star  tracker  integration  time  and  sample  time  in  seconds 

9^n  =  the  random  walk  drift  error  of  the  rate  integrating  gyro 
output  at  time  nT  seconds; 

D  =  the  random  walk  drift  rate  at  nT  seconds; 
n 

w  =  a  discrete  white  random  sequence  driving  the  random  walk 
drift  rate  (units  are  arcseconds/second 1 ; 

z  =  the  delaved  drift  error  measurement  in  arcseconds; 
n 

v  =  the  discrete  random  noise  in  the  star  tracker  measurement 
in  arcseconds; 

ri  =  the  discrete  random  noise  in  the  RIG  measurement  resulting 
from  high  frequency  rate  noise  (units  are  arcseconds) . 

The  covariances  of  the  discrete  random  sequences  are  defined  as: 
kl  -  2 

N  =  <wn  >  ^  related  to  the  rms  gyro  drift 

2 

R  =  <v^“>  .  mean  square  star  tracker  error 

2 

Q  =  <n  >  ,  mean  square  high  frequency  RIG  (Rate  Integrating 

Gyro)  output  noise. 

where  <>  denotes  an  ensemble  average.  IVe  further  assume  that  all 
noise  sources  are  uncorrellated,  that  is 


V  > 

=  0 

n 

n 

<w 

n  > 

=  0 

n 

n 

<v 

n  > 

=  0 

n 

n 

and  we  further  assume  the  variables  are  ccro  mean. 

For  the  example  used  in  Chapter  V,  the  following  coefficients 
were  used: 

T  =  1  second 

W  =  0.5  arcsecond/socon  ,  or  varied 
(R+Q)  =  1.0  arcs.-conds 
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Appendix  C3 
PAPER  MACHINE  MODEL 


Control  of  a  pressurized  flow-box  for  a  paper  machine. 

This  example  was  originally  presented  by  MacFarlane  [MacFarlane,  1972], 
and  subsequently  was  written  up  by  A.E.  Bryson  [personal  notes,  11/72]. 
Later,  J.D.  Powell  modified  the  parameters,  as  an  academic  exercise,  to 
couple  the  states. 


H  =  h  +  p*—  =  total  head  at  slice  (perturbation  in) , 
s8 

h  =  perturbation  in  stock  level, 
u&  =  perturbation  in  air  valve  opening, 


ug  =  perturbation  in  stock  valve  opening, 


(  )  denotes  mean  values 


Plant  Model 


a^'s  and  b^'s  constants 


I 


Control  Objectives 

(a)  To  keep  H  and  h  near  zero  in  presence  of  disturbances  using  a 
small  range  of  available  u  ,  u 

3  S 

(b)  To  be  able  to  command  small  changes  in  H  and  h  separately  with 
quick  response  and  good  steady-state  accuracy. 

Numerical  Example 

For  a^  =  .395  sec  ^  -  -0115  sec  ^  ,  a^  =  .011  sec  ^ 

=  1.038  inches  of  ^0  per  sec  per  unit  of  ug 

b?  =  .0336  inches  of  1^0  per  sec  per  unit  of  ug 

b^  =  .000966  inches  of  per  sec  per  unit  of  ug 

and  the  choice  that  we  are  willing  to  commit  100  units  of  u^  for 

H  =  1  inch  H20  and  1000  units  of  ug  for  h  =  1  inch  H^O  , 

and  the  choice  that  we  want  the  system  to  "settle"  for  an 

within  2  sec.  and  for  an  h  command  in  5  sec.,  and  the  choice 

c  - 

that  levels  of  H  and  h  be  about  the  same,  a  good  choice 
of  weighting  parameters  for  the  quadratic  performance  is: 

A,  =  1,  A  -  -5  sec  *  A  -  yr  sec  ,  B  -  (tqq)  ,  B  -  2 

"  H  n  ■  1UU  (1000) 1 

Powell's  modified  version: 


air  -7> 


u  +u 
a  a 


stock 


Vus 


p 4 

P 

! 

h  + 

h 

Stock  is  deposited 
on  endless  belt 
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r.  1 

“ 

- 

— 

H  1 

-0.2 

+0.1 

+1 

H 

0 

1 

. 

h 

= 

-0.05 

0 

0 

h 

+ 

0 

0.7 

• 

u 

a 

-  - 

0 

0 

u 

a 

k—  - 

1 

0 

where  H  = 

h  = 
u 

a 

u  = 
c 

u  = 
s 

(')  = 


h  +  rr*—  =  total  head  perturbation,  in  meters 
Ps8 


stock  level  perturbation,  in  meters 
perturbation  in  air  valve  opening 

command  value  to  air  valve,  in  /sec 

perturbation  in  stock  valve  opening,  in  kg/sec 


denotes  mean  value 


y  = 


"l 

o 


o 

l 


This  system  was  discretized  using  a  sample  rate  of  five  Hertz  to 
produce  the  following  model  (from  Abbas  Emami) : 


i+1 

=  4>x.  +  F1u.  +  row. 

i  1  l  2  i 

fifw.w.)  = 
1  J 

yi 

=  Hx .  +  v. 

l  l 

S(ViVP)  = 

J  =  Z(xTax.  +  uTbu . ) 

li  li 

where  is  the  deterministic  input.  The  parameters  are 


P 


l 


k 
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0 


0.96069 

-.0098023 

0 


0.019605 

0.9999 

0 


0.17757 

-.00092396 

0.81873 


".0184  79 

.19743" 

"l 

0 

o" 

-6.2  x  10“5 

.139 

r2  " 

0 

1 

0 

_. 18127 

0 

_0 

0 

1 

H 


1  0  0" 
0  10 


0  5 

0  0 


0 

0 

1 


x  10- 


0 

1 


Q  = 


1 

o 

0 


0 

1 

0 


0 

0 

1 


10 


-3 


0 

1 


x  10 
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Appendix  C4 

STANFORD  RELATIVITY  SATELLITE  [Hadass,  19 7A] 


The  plant  used  in  this  example  was  originally  investigated  by 
Bull  [Ref.  l] .  The  governing  equations  for  an  approximate  model  are: 

1^  0  =  k<{)  +  vi 

I9(<{)+0)  =  -k<f>  +  U  -  +  u'2 

and  a  pictorial  representation  can  be  found  in  Figure  C.A.l. 

From  Hadass,  we  get  the  following  state  space  description: 


X, 

"0' 

1 

0 

2 

— 

X3 

a 

• 

_XA_ 

a 

- 

r  - 

- 

- 

0 

1 

0 

0 

0 

0 

0 

x  = 

0 

0 

2 

aw 

o 

ba/I2 

x  + 

0 

u  + 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

_ 

0 

2 

-w 
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-b“/Io 

J 

y\ 

0 

1 

y  = 

[1 

0 

0 

0]w  +  v 

2  2 

where  w  =  k/I0,  aw  =  k/I, ,  and  I_  =  (I. I0) /(I..+I,) .  Values  chosen 

O  JO  1  j  i.  i  ^ 


-  19.375  sec  ^ 

ba/I,  =  1.3175  x  lO-2 

o 

2  K  “2 

U)  =  25  sec 

-ba/I  =  -1.7  x  10~2 

I2  =  250  kg  m2 


included 


aril 


§(ww  )  =  Q  = 


6.6  0 

0  2.0 


-2  -  -3 

x  10  [rad  sec  ] 


fe(w^)  =  R  =  5.4  x  10  ^  rad3  sec 


The  cost  function  weighting  matrices  were 


A  = 


fl.2  x  104 


0 

0 

0 


0  0 

5  0 

0  6.5  x  105 

0  0 


0 

0 

0 

6.5  x  103 


Note  that  this  data  was  chosen  to  match  Hadass's  results,  and  differ 
in  details  from  the  models  he  describes. 


. . 


wtw  •  ' ' 
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